We use cookies for analytics to improve our service. See our Privacy Policy.

    Sign up free to unlock interview prep materials and a free mock interview for your next role.

    Start Free
    system design
    question breakdown
    chat system

    Design a Chat System

    Hoppers AI Team·March 12, 2026·12 min read

    Design a Chat System — Complete System Design Walkthrough

    Chat systems are among the most frequently asked system design questions at top tech companies. Building one requires careful consideration of real-time communication, data consistency, presence management, and horizontal scalability. In this walkthrough, we will design a chat system from scratch, covering each stage of a structured system design interview.

    Clients(Mobile / Web)WebSocketGatewayChat ServiceMessage QueueMessage StoreMedia StorePresenceServicePush NotificationServiceAPI Gateway(REST)Group ServiceWSpersistRESTSynchronousAsynchronousChat System — High-Level Architecture

    Stage 1: Requirements Gathering

    Before drawing a single box, clarify scope with your interviewer. This is where most candidates differentiate themselves. Spend 3-5 minutes here.

    Functional Requirements

    • 1-on-1 messaging — Real-time text message delivery between two users.
    • Group chat — Support groups of up to 500 members with the same real-time delivery guarantees.
    • Online presence — Show whether a user is online, offline, or idle.
    • Read receipts — Indicate when a message has been delivered and read by the recipient(s).
    • Message history — Persistent storage with paginated retrieval of past conversations.
    • Media sharing — Support for images and files (we will address the upload path but not build a full CDN).

    Non-Functional Requirements

    • Scale: 50 million DAU, 500 million messages per day (~6,000 messages/second average, ~18,000 peak).
    • Latency: End-to-end message delivery under 300ms for online recipients (same region).
    • Availability: 99.99% uptime — chat is a core communication channel.
    • Consistency: Messages within a conversation must be ordered. Eventual consistency across read receipts and presence is acceptable.
    • Storage: Retain messages for at least 5 years. Average message size ~200 bytes, leading to roughly 100 GB of new message data per day before replication.
    Interview tip: Always state your scale numbers explicitly. It shows the interviewer you understand that design decisions are driven by scale, and it gives you concrete numbers to reference during capacity estimation later.

    Stage 2: API Design

    A chat system uses two communication channels: WebSocket for real-time bidirectional messaging and REST for stateless operations like fetching history or managing groups.

    WebSocket Events (Real-Time Channel)

    DirectionEventPayload
    Client → Serversend_message{ conversationId, content, contentType, clientMsgId }
    Server → Clientnew_message{ messageId, conversationId, senderId, content, contentType, timestamp }
    Client → Servertyping{ conversationId }
    Server → Clienttyping_indicator{ conversationId, userId }
    Client → Serverack{ messageId }
    Server → Clientread_receipt{ conversationId, userId, lastReadMessageId }
    Server → Clientpresence_update{ userId, status, lastSeen }

    The clientMsgId field is a UUID generated by the client. It serves as an idempotency key to prevent duplicate messages on retry, and lets the client correlate its optimistic UI update with the server-confirmed messageId.

    REST Endpoints

    MethodEndpointPurpose
    GET/v1/conversations/{id}/messages?cursor=&limit=50Paginated message history (cursor-based)
    GET/v1/conversationsList user's conversations with last message preview
    POST/v1/conversationsCreate group or 1-1 conversation
    PUT/v1/conversations/{id}/membersAdd/remove group members
    POST/v1/media/uploadUpload media, returns a media URL
    Design decision: Why not use REST for sending messages? WebSocket connections are already open for receiving messages. Sending over the same channel avoids an extra HTTP round-trip and lets us push delivery confirmations immediately. REST is reserved for operations that do not require real-time semantics.

    Stage 3: Data Model

    The choice of storage is critical. Chat data has a clear access pattern: most reads are sequential within a conversation, and writes are append-heavy. This makes wide-column stores like Apache Cassandra or ScyllaDB ideal for the message store, while user and conversation metadata fits well in a relational database like PostgreSQL.

    Messages Table (Cassandra)

    ColumnTypeRole
    conversation_idUUIDPartition key
    message_idTimeUUID / Snowflake IDClustering key (DESC)
    sender_idUUID
    contenttext
    content_typeenumtext, image, file
    created_attimestamp
    client_msg_idUUIDIdempotency key

    Partitioning by conversation_id ensures all messages in a conversation are co-located. The clustering key orders messages by time, making paginated history retrieval a single partition scan.

    Conversations Table (PostgreSQL)

    ColumnType
    conversation_idUUID (PK)
    typeenum (direct, group)
    nametext (nullable, for groups)
    created_attimestamp
    last_message_attimestamp
    last_message_previewtext

    Participants Table (PostgreSQL)

    ColumnType
    conversation_idUUID (FK)
    user_idUUID (FK)
    joined_attimestamp
    last_read_message_idUUID (nullable)
    roleenum (member, admin)

    The last_read_message_id in the participants table is the backbone of read receipts. When a user opens a conversation, the client sends an ack with the latest message ID, and we update this column. To render receipts, we simply query all participants for a conversation and compare their last_read_message_id against each message.

    Storage Choice Rationale

    • Cassandra for messages: Append-only writes, time-range queries, and horizontal scalability make it a natural fit. At 500M messages/day, a relational DB would struggle with write throughput on a single table.
    • PostgreSQL for metadata: Conversations and participants are low-volume, highly relational data. Joins (list conversations for a user with last message) are natural here.
    • Redis for presence and sessions: Ephemeral data with high read/write frequency. TTL-based expiration maps perfectly to online/offline detection.

    Stage 4: High-Level Architecture

    Let us trace the lifecycle of a message from sender to recipient.

    Message Delivery Flow

    1. Client sends a send_message event over its WebSocket connection to the WebSocket Gateway.
    2. The Gateway forwards the message to the Chat Service, which validates the payload, generates a message_id (Snowflake ID for global ordering), and writes it to the Message Store (Cassandra).
    3. The Chat Service updates the conversation's last_message_at and last_message_preview in PostgreSQL.
    4. The Chat Service publishes the message to a Message Queue (Kafka or Redis Streams), partitioned by conversation_id.
    5. A set of Fan-out Workers consume from the queue. For each recipient in the conversation, the worker checks the Session Registry (Redis) to find which Gateway server holds the recipient's WebSocket connection.
    6. If the recipient is online: the worker routes the message to the correct Gateway server, which pushes it to the recipient's WebSocket.
    7. If the recipient is offline: the worker enqueues a push notification via the Push Notification Service (APNs/FCM).

    Fan-Out Strategy

    For 1-1 chats, fan-out is trivial — one recipient. For group chats, we fan out to all members. With a 500-member cap, this is manageable. If we needed to support channels with millions of subscribers (like a broadcast system), we would switch to fan-out-on-read, but for groups up to 500, fan-out-on-write keeps delivery latency low.

    WebSocket Gateway

    The Gateway is a stateful service that maintains persistent WebSocket connections. Key design considerations:

    • Sticky sessions via consistent hashing — A load balancer maps each user to a specific Gateway instance. If a Gateway fails, connections are redistributed.
    • Session Registry — A Redis hash maps user_id to gateway_instance_id. When a user connects, we register; on disconnect, we deregister. The fan-out worker consults this registry to route messages.
    • Heartbeats — Clients send periodic pings (every 30 seconds). If three consecutive pings are missed, the server closes the connection and marks the user offline.

    Stage 5: Deep Dive

    We will dive deep into two critical subsystems: message ordering and the presence system.

    Deep Dive 1: Message Ordering

    Correct message ordering is one of the hardest problems in distributed chat. Users expect messages to appear in a consistent, intuitive sequence, but network delays, clock skew, and concurrent writes make this challenging.

    ID Generation with Snowflake IDs

    We use a Snowflake-style ID generator that produces 64-bit IDs composed of:

    • 41 bits: Millisecond timestamp (gives ~69 years of range).
    • 10 bits: Machine/worker ID (1,024 workers).
    • 13 bits: Sequence number (8,192 messages per millisecond per worker).

    Because the timestamp is the most significant portion, IDs are roughly time-ordered. Messages from different senders in the same millisecond will have different worker IDs, but that is acceptable — the ordering within a conversation only needs to be consistent, not perfectly causal.

    Causal Ordering Within a Conversation

    For stronger ordering guarantees, we use a per-conversation sequence number. The Chat Service maintains an atomic counter per conversation (stored in Redis). When a message arrives:

    1. Increment the counter atomically: INCR conv:{conversation_id}:seq.
    2. Attach the sequence number to the message before writing to Cassandra.
    3. Clients render messages ordered by this sequence number.

    This ensures that even if two messages arrive at different Chat Service instances simultaneously, they receive distinct, sequential numbers. The Redis INCR operation is atomic and operates in ~0.1ms, adding negligible latency.

    Handling Client-Side Ordering

    Clients may receive messages out of order due to network jitter. The client maintains a local buffer and sorts by sequence number. If a gap is detected (e.g., received seq 5 and 7 but not 6), the client waits briefly (200ms) for the missing message before requesting it via REST as a fallback.

    Deep Dive 2: Presence System

    Presence (online/offline/idle) seems simple but becomes complex at scale. With 50M DAU, we cannot broadcast presence changes to everyone.

    Architecture

    The Presence Service maintains state in Redis with the following schema:

    KeyValueTTL
    presence:{user_id}{ status: "online", last_seen: ts, gateway_id: "gw-3" }90 seconds

    The lifecycle works as follows:

    1. Connection established: Gateway registers the user in Redis with status online and a 90-second TTL.
    2. Heartbeat received: Gateway refreshes the TTL. If the user is active (sending messages, typing), status remains online. After 60 seconds of inactivity, status transitions to idle.
    3. Connection lost: Gateway deletes the key. If the Gateway crashes, the TTL ensures the key expires automatically — no stale presence data.

    Presence Fan-Out (Subscription Model)

    We do not broadcast presence to all users. Instead, we use a subscription model:

    • When a user opens the app, the client subscribes to presence updates for users visible on screen (recent conversations, group members).
    • The Presence Service maintains a reverse index: presence_subscribers:{user_id} → Set of subscriber user_ids.
    • When a user's status changes, we publish the update only to their subscribers.
    • When a user navigates away from a conversation, the client unsubscribes.

    This reduces presence fan-out from O(N) to O(K), where K is the number of active subscribers per user — typically under 100.

    Consistency Trade-Off

    Presence is inherently eventually consistent. A user might appear online for up to 90 seconds after their connection drops (if the Gateway crashes and cannot deregister). This is an acceptable trade-off — users are accustomed to slight delays in presence indicators. The alternative (strong consistency with a coordination protocol) would add latency to every heartbeat and is not worth the cost.

    Stage 6: Scaling and Trade-Offs

    Connection Management at Scale

    Each WebSocket connection consumes a file descriptor and a small amount of memory (~10-20 KB). A modern server can handle ~500,000 concurrent connections. With 50M DAU and assuming 20% concurrency (10M simultaneous connections), we need approximately 20 Gateway instances.

    Key scaling considerations:

    • Horizontal scaling: Add Gateway instances behind a load balancer. Use consistent hashing so that reconnecting clients are likely to hit the same instance (reducing session registry churn).
    • Graceful shutdown: When scaling down, drain connections gradually. Send a reconnect signal to connected clients so they migrate to another Gateway.
    • Multi-region: Deploy Gateway clusters in multiple regions. Use a global Session Registry (or regional registries with cross-region lookup) so messages can be routed to users connected in a different region.

    Sharding the Message Store

    Cassandra handles sharding natively via consistent hashing on the partition key (conversation_id). However, hot partitions can arise from extremely active group chats. Mitigation strategies:

    • Bucketed partitions: For very active conversations, append a time bucket to the partition key: conversation_id#YYYY-MM-DD. This spreads writes across multiple partitions while keeping time-range queries efficient.
    • Compaction strategy: Use TimeWindowCompactionStrategy — it aligns well with time-ordered message data and avoids expensive full compactions.

    Kafka Partitioning for Message Delivery

    The message queue is partitioned by conversation_id. This ensures that messages within a conversation are consumed in order by a single fan-out worker. If a consumer falls behind, we can increase partitions, but we must re-hash conversation assignments carefully to avoid reordering.

    Trade-Off Summary

    DecisionChoiceTrade-Off
    Message storeCassandraHigh write throughput, but no cross-partition transactions. Acceptable because messages are independent.
    Metadata storePostgreSQLStrong consistency for conversations/participants, but limited write scalability. Acceptable at our metadata volume.
    Fan-out strategyWrite (push to recipients)Low latency delivery, but costlier for large groups. 500-member cap keeps it manageable.
    Presence consistencyEventual (TTL-based)May show stale online status briefly. Acceptable UX trade-off vs. coordination overhead.
    Message orderingPer-conversation sequence counterAdds a Redis round-trip per message. Negligible at 0.1ms, but Redis becomes a dependency.
    ID generationSnowflake IDsRoughly time-ordered without coordination. Clock skew can cause minor reordering across workers, but sequence numbers compensate.

    Scoring Tips

    To score well on a chat system design question, keep these principles in mind:

    • Lead with requirements. State your scale assumptions (DAU, messages/day, peak QPS) before drawing anything. This frames every subsequent decision.
    • Justify your storage choices. Do not just say "use Cassandra." Explain why the access pattern (append-heavy, time-range queries, partition-per-conversation) matches the storage engine's strengths.
    • Address the hard parts proactively. Message ordering and presence at scale are where interviewers probe. Show you understand the subtlety — clock skew, fan-out costs, eventual consistency trade-offs.
    • Distinguish 1-1 from group chat. The fan-out strategy, storage partitioning, and presence subscription model all change with group size. Interviewers expect you to address this.
    • Show depth over breadth. It is better to deeply explain message ordering (Snowflake IDs + per-conversation sequence counters + client-side buffering) than to superficially mention ten features.
    • Mention failure modes. What happens when a Gateway crashes? (TTL-based presence cleanup, message retries via the queue.) What happens when Cassandra is temporarily unavailable? (Queue absorbs writes, replay on recovery.) These details signal production-level thinking.

    Practice walking through this architecture end-to-end in under 35 minutes. If you can explain each stage concisely while fielding follow-up questions, you are well-prepared for any chat system design interview. Tools like Hoppers AI can help you simulate this experience with real-time feedback on structure, depth, and communication clarity.