We use cookies for analytics to improve our service. See our Privacy Policy.

    Sign up free to unlock interview prep materials and a free mock interview for your next role.

    Start Free
    system design
    question breakdown
    notifications

    Design a Notification Service

    Hoppers AI Team·March 12, 2026·12 min read

    Design a Notification Service — Complete System Design Walkthrough

    Notification services are a staple of system design interviews because they touch on event-driven architecture, multi-channel delivery, rate limiting, and reliability at scale. Every major platform — from e-commerce to social media to fintech — relies on a notification system to keep users informed. In this walkthrough, we will design one from the ground up, following the structured 6-stage approach that top interviewers expect.

    Event SourcesOrders, Alerts,Marketing, SocialSystem EventsNotificationServiceValidate, Template,Rate LimitPriorityQueueP0 / P1 / P2Push WorkerSMS WorkerEmail WorkerIn-App WorkerDeliveryTrackingSynchronousAsync / QueueNotification Service — High-Level Architecture

    Stage 1: Requirements Gathering

    Start by clarifying the scope. A notification service can mean many things — from a simple email sender to a full platform handling billions of events across multiple channels. Spend 3-5 minutes aligning with your interviewer on what matters most.

    Functional Requirements

    • Multi-channel delivery — Support four channels: push notifications (APNs/FCM), SMS, email, and in-app notifications.
    • User preferences — Users can opt in or out of specific notification types per channel (e.g., receive marketing emails but not marketing push notifications).
    • Template system — Notifications are rendered from reusable templates with dynamic variables (user name, order ID, etc.), not hardcoded strings.
    • Rate limiting — Protect users from notification fatigue. Enforce per-user limits (e.g., no more than 5 push notifications per hour) and global limits per notification type.
    • Priority levels — Support at least three priority tiers: P0 (critical — security alerts, OTP), P1 (transactional — order confirmations, shipping updates), P2 (promotional — marketing campaigns, recommendations).
    • Delivery tracking — Track the status of every notification through its lifecycle: created, queued, sent, delivered, failed, read.
    • Scheduling — Support sending notifications at a future time (e.g., "send this campaign at 9 AM in the user's local timezone").

    Non-Functional Requirements

    • Scale: 100 million registered users, 500 million notifications per day (~5,800 notifications/second average, ~20,000/second at peak).
    • Latency: P0 notifications delivered within 5 seconds. P1 within 30 seconds. P2 best-effort within minutes.
    • Availability: 99.99% — missed notifications damage user trust and can have financial or security implications.
    • Reliability: At-least-once delivery guarantee. Duplicate delivery is preferable to missed delivery, but we should minimize duplicates.
    • Extensibility: Adding a new channel (e.g., WhatsApp, Slack) should require implementing a worker, not redesigning the architecture.
    Interview tip: Explicitly stating priority tiers and per-channel delivery SLAs demonstrates that you think about notification systems the way production teams do. Interviewers at companies like Google, Meta, and Amazon will immediately recognize this as a signal of real-world experience.

    Stage 2: API Design

    The notification service exposes APIs for producers (internal services that trigger notifications) and for end users (managing preferences and viewing notification history).

    Producer APIs (Internal)

    MethodEndpointPurpose
    POST/v1/notifications/sendSend a single notification
    POST/v1/notifications/batchSend to multiple recipients (campaigns)
    GET/v1/notifications/{id}/statusQuery delivery status

    Send Notification Payload

    { "templateId": "order_shipped", "userId": "u_abc123", "channels": ["push", "email"], "priority": "P1", "data": { "orderId": "ORD-7891", "trackingUrl": "https://..." }, "scheduledAt": null, "idempotencyKey": "ship-ORD-7891" }

    The idempotencyKey prevents duplicate notifications when producers retry after network failures. The service deduplicates by storing processed keys in a TTL-based cache (Redis) for 24 hours.

    User-Facing APIs

    MethodEndpointPurpose
    GET/v1/users/{id}/notifications?cursor=&limit=20Paginated in-app notification feed
    PUT/v1/users/{id}/preferencesUpdate notification preferences per channel and type
    GET/v1/users/{id}/preferencesRetrieve current preferences
    POST/v1/users/{id}/notifications/{id}/readMark a notification as read

    Batch Send Payload

    { "templateId": "weekly_digest", "userIds": ["u_1", "u_2", ... ], "channels": ["email"], "priority": "P2", "data": { "weekStart": "2026-03-06" }, "scheduledAt": "2026-03-12T09:00:00Z" }

    For large campaigns (millions of recipients), the batch endpoint accepts a segment ID instead of a user list. The Notification Service resolves the segment asynchronously by querying the user service.

    Design decision: Why not let producers specify the notification body directly? Templates enforce brand consistency, prevent injection vulnerabilities, and allow non-engineers (product, marketing) to update copy without code changes. The producer sends structured data; the Notification Service handles rendering.

    Stage 3: Data Model

    We need storage for four distinct concerns: notification records, templates, user preferences, and delivery logs. Each has different access patterns and scale characteristics.

    Notifications Table (Cassandra / DynamoDB)

    ColumnTypeRole
    notification_idUUIDPartition key
    user_idUUIDIndexed (for feed queries)
    template_idstring
    channelslist<string>Target channels
    priorityenum (P0, P1, P2)
    dataJSONTemplate variables
    statusenumcreated, queued, sent, failed
    created_attimestamp
    scheduled_attimestamp (nullable)
    idempotency_keystringDeduplication

    For the in-app notification feed, we need efficient queries by user_id ordered by created_at. In DynamoDB, this means a GSI with PK=USER#{user_id} and SK=NOTIF#{created_at}#{notification_id}. In Cassandra, partition by user_id with clustering on created_at DESC.

    Templates Table (PostgreSQL / DynamoDB)

    ColumnType
    template_idstring (PK)
    versioninteger
    channelenum
    subjectstring (nullable, for email)
    bodytext (with placeholders like {{userName}})
    created_attimestamp
    updated_attimestamp

    Templates are versioned so that in-flight notifications always render with the template that was active when they were created, not a newer version that may have changed the messaging.

    User Preferences Table

    ColumnType
    user_idUUID (PK)
    channelenum (push, sms, email, in_app)
    categorystring (transactional, marketing, social, security)
    enabledboolean
    quiet_hours_starttime (nullable)
    quiet_hours_endtime (nullable)
    timezonestring

    The composite key is (user_id, channel, category). This allows fine-grained control: a user can receive transactional push notifications but opt out of marketing pushes, while still receiving marketing emails.

    Delivery Logs Table (Cassandra / S3 for archival)

    ColumnType
    notification_idUUID
    channelenum
    attemptinteger
    statusenum (sent, delivered, bounced, failed)
    provider_responseJSON
    attempted_attimestamp

    Delivery logs are write-heavy and append-only — a natural fit for Cassandra or a time-series store. Older logs (beyond 30 days) can be archived to S3 in Parquet format for cost-efficient analytics.

    Stage 4: High-Level Architecture

    The architecture follows an event-driven, queue-based pattern that decouples notification ingestion from delivery. This is essential for handling bursty traffic (e.g., a flash sale generating millions of notifications simultaneously).

    End-to-End Flow

    1. Producer sends request to the Notification Service via REST API. The service validates the payload, checks the idempotency key, and persists the notification record with status created.
    2. Preference check — The service queries user preferences to determine which channels are enabled for this notification category. If the user has opted out of all requested channels, the notification is marked skipped and no further processing occurs.
    3. Template rendering — The service fetches the template for each enabled channel and renders it with the provided data variables. Each channel may have a different template (push notifications are short; emails are rich HTML).
    4. Rate limit check — The service checks per-user rate limits using a sliding window counter in Redis. If the limit is exceeded, P2 notifications are deferred (re-queued with a delay); P0 and P1 notifications bypass rate limiting.
    5. Enqueue — The rendered notification is placed onto a priority queue (Kafka or SQS with separate queues per priority). P0 messages go to a dedicated high-priority topic with more consumers.
    6. Channel workers consume from the queue and dispatch to the appropriate external provider: APNs/FCM for push, Twilio/AWS SNS for SMS, SES/SendGrid for email, or write directly to the in-app feed store.
    7. Delivery tracking — Each worker writes a delivery log entry. For push and email, the worker processes webhook callbacks from providers (delivery receipts, bounces, complaints) and updates the notification status.

    Component Responsibilities

    ComponentResponsibilityTech Choice
    Notification ServiceValidation, dedup, preferences, templating, rate limiting, routingStateless microservice (horizontally scalable)
    Priority QueueDecouple ingestion from delivery, priority-based consumptionKafka (3 topics: P0, P1, P2) or SQS with separate queues
    Channel WorkersChannel-specific delivery logic, provider SDK integrationConsumer groups, auto-scaled by queue depth
    Provider AbstractionUnified interface over multiple providers per channelStrategy pattern (e.g., EmailProvider interface with SES and SendGrid implementations)
    Delivery TrackerWebhook ingestion, status updates, analyticsSeparate service consuming provider callbacks
    SchedulerHold scheduled notifications and release at the right timeDynamoDB TTL + Lambda trigger, or a delayed queue

    Provider Abstraction Layer

    Each channel worker does not talk to a specific provider directly. Instead, it calls a Provider Abstraction Layer that exposes a uniform interface:

    interface NotificationProvider { send(recipient: string, content: RenderedContent): DeliveryResult }

    Behind this interface, we can have multiple implementations per channel. For email: SES (primary), SendGrid (fallback). For SMS: Twilio (primary), AWS SNS (fallback). This abstraction is critical for provider failover — if one provider's API is degraded, we route traffic to the backup without changing any worker code.

    Stage 5: Deep Dive

    We will go deep on two critical subsystems: per-user rate limiting and the retry/delivery guarantee mechanism.

    Deep Dive 1: Rate Limiting Per User

    Notification fatigue is real. Bombarding users with excessive notifications leads to app uninstalls, email unsubscribes, and brand damage. Rate limiting is not just a technical feature — it is a business requirement.

    Sliding Window Counter (Redis)

    We implement a sliding window log algorithm using Redis sorted sets. For each user and channel combination:

    • Key: ratelimit:{user_id}:{channel}
    • Members: Notification timestamps (as scores and values)
    • Operations per notification:
      1. ZREMRANGEBYSCORE — Remove entries older than the window (e.g., 1 hour ago).
      2. ZCARD — Count remaining entries. If count exceeds the limit (e.g., 5 for push), reject or defer.
      3. ZADD — Add the current timestamp if the notification is allowed.
    • TTL: Set a TTL on the key equal to the window size to prevent memory leaks from inactive users.

    All three operations are wrapped in a Lua script for atomicity:

    EVAL "redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, ARGV[1]) local count = redis.call('ZCARD', KEYS[1]) if count < tonumber(ARGV[2]) then redis.call('ZADD', KEYS[1], ARGV[3], ARGV[3]) redis.call('EXPIRE', KEYS[1], ARGV[4]) return 1 end return 0" 1 ratelimit:{user_id}:push {window_start} {limit} {now} {ttl_seconds}

    Tiered Rate Limits

    PriorityPush (per hour)SMS (per day)Email (per day)In-App
    P0 (Critical)UnlimitedUnlimitedUnlimitedUnlimited
    P1 (Transactional)201030Unlimited
    P2 (Promotional)53520

    P0 notifications (OTP codes, security alerts, fraud warnings) always bypass rate limits. You never want a user locked out of their account because they hit a rate limit on verification codes.

    Quiet Hours

    If the user has configured quiet hours (e.g., 10 PM to 8 AM), non-critical notifications are deferred to the next available window. The Notification Service checks the user's timezone and quiet hours during the preference check step and adjusts the scheduledAt accordingly. P0 notifications ignore quiet hours.

    Deep Dive 2: Retry and Delivery Guarantee

    Our target is at-least-once delivery. This means every notification must either be successfully delivered or exhaust its retry budget before being marked as permanently failed.

    Retry Strategy: Exponential Backoff with Jitter

    When a channel worker fails to deliver a notification (provider timeout, 5xx response, network error), it re-enqueues the message with a delay:

    • Attempt 1: Immediate
    • Attempt 2: 30 seconds + random jitter (0-10s)
    • Attempt 3: 2 minutes + random jitter (0-30s)
    • Attempt 4: 10 minutes + random jitter (0-60s)
    • Attempt 5: 1 hour (final attempt)

    The jitter prevents thundering herd problems when a provider recovers from an outage and thousands of retries fire simultaneously.

    Dead Letter Queue

    After exhausting all retry attempts, the notification moves to a Dead Letter Queue (DLQ). An operations team monitors the DLQ dashboard and can:

    • Bulk retry — If the failure was due to a temporary provider outage, replay all DLQ messages.
    • Route to alternate provider — Replay DLQ messages through the fallback provider.
    • Mark as permanently failed — For invalid device tokens, unsubscribed phone numbers, or bounced email addresses.

    Idempotency and Deduplication

    At-least-once delivery means duplicates are possible. We mitigate this at two levels:

    1. Producer-level: The idempotencyKey in the send request prevents the same event from creating multiple notification records. The service stores processed keys in Redis with a 24-hour TTL.
    2. Worker-level: Each delivery attempt is logged with a (notification_id, channel, attempt) tuple. Before sending, the worker checks whether a successful delivery already exists for this notification and channel. If so, it skips the send.

    Ensuring No Message Loss in the Queue

    The queue itself must guarantee durability:

    • Kafka: Use acks=all with replication factor 3. Messages are persisted to disk before acknowledgment. Consumer offsets are committed only after successful processing (at-least-once semantics).
    • SQS: Messages are redundantly stored across multiple AZs. Visibility timeout ensures a message is re-delivered if the consumer crashes before acknowledging it.

    The Notification Service writes the notification to the database before enqueuing it. If the enqueue fails, a periodic reconciliation job scans for notifications in created status that are older than 5 minutes and re-enqueues them. This belt-and-suspenders approach ensures no notification is silently lost.

    Stage 6: Scaling and Trade-Offs

    Provider Failover

    External providers are the biggest source of unreliability in a notification system. A provider outage should not mean failed notifications — it should mean automatic rerouting.

    The Provider Abstraction Layer maintains a health score per provider, updated via a circuit breaker pattern:

    • Closed (healthy): All traffic goes to the primary provider. If the error rate exceeds 10% over 60 seconds, trip the circuit.
    • Open (unhealthy): All traffic routes to the fallback provider. After 5 minutes, transition to half-open.
    • Half-open: Send 10% of traffic to the primary provider. If success rate exceeds 90%, close the circuit (return to primary).

    Health scores are stored in Redis and shared across all worker instances so that failover decisions are coordinated. This prevents a split-brain scenario where some workers send to the unhealthy provider while others have already failed over.

    Batching and Digest Mode

    For high-volume notification types (e.g., social media "someone liked your post"), sending individual notifications for each event is wasteful and annoying. Instead, we aggregate events into digests:

    • When a P2 notification arrives and the user already has unread notifications of the same type within the last hour, the service merges them: "3 people liked your post" instead of three separate notifications.
    • For email, a digest job runs periodically (e.g., hourly or daily) and batches all pending notifications into a single email rendered from a digest template.
    • Digest aggregation uses Redis sorted sets keyed by digest:{user_id}:{category}. Individual events are added as members. The digest job reads the set, renders the template, sends the notification, and clears the set.

    Cross-Region Delivery

    With 100M users globally, a single-region architecture introduces latency for distant users and creates a single point of failure.

    • Multi-region queues: Deploy Kafka clusters in each major region (US, EU, APAC). The Notification Service routes messages to the queue in the user's home region based on their profile data.
    • Regional channel workers: Each region has its own pool of workers. This reduces latency to providers (APNs/FCM have regional endpoints) and keeps data closer to users for compliance (GDPR — EU user data processed in EU).
    • Global notification store: Use DynamoDB Global Tables or Cassandra multi-datacenter replication for the notification feed. Users traveling across regions can still access their notification history with low latency.
    • Cross-region event forwarding: If a notification must be delivered to a user in a different region from the producer, the Notification Service publishes to a cross-region event bus (EventBridge or Kafka MirrorMaker) rather than sending directly. The destination region's workers handle actual delivery.

    Capacity Planning

    At 500M notifications/day across four channels:

    • Push: ~200M/day (40%). FCM and APNs can handle millions per second. The bottleneck is our worker throughput, not the provider. Budget 50 push workers at peak.
    • Email: ~150M/day (30%). SES supports 50,000/second in production. Budget 30 email workers.
    • In-app: ~120M/day (24%). Direct database writes, no external provider. Budget 20 workers.
    • SMS: ~30M/day (6%). SMS is expensive ($0.01-0.05 per message) and slow (provider rate limits). Budget 20 SMS workers with careful rate management.

    Worker auto-scaling is driven by queue depth metrics. When the P0 queue depth exceeds 1,000, scale up immediately. For P2, tolerate higher queue depths before scaling to control costs.

    Trade-Off Summary

    DecisionChoiceTrade-Off
    Delivery guaranteeAt-least-oncePossible duplicates, but no missed notifications. Dedup logic mitigates most duplicates.
    Queue architectureSeparate queues per priorityMore infrastructure to manage, but P0 latency is never impacted by P2 volume spikes.
    Rate limitingSliding window (Redis)Adds ~1ms latency per notification. Redis becomes a dependency, but it is already in the critical path for dedup.
    Template renderingServer-sideProducers cannot customize delivery content, but this ensures brand consistency and prevents injection.
    Provider abstractionCircuit breaker failoverAdds complexity, but provider outages become transparent to the rest of the system.
    Digest modeAggregation in RedisDelayed delivery for low-priority notifications. Acceptable for P2; P0/P1 are never digested.

    Scoring Tips

    To score well on a notification service design question, keep these principles in mind:

    • Start with channels and priorities. This immediately shows you understand that not all notifications are equal. A security alert and a marketing email have fundamentally different SLAs, and your architecture should reflect that.
    • Emphasize the provider abstraction. Interviewers love to ask "what happens when your SMS provider goes down?" If you have already described a circuit breaker with automatic failover, you have answered the question before it is asked.
    • Show the queue is the backbone. The priority queue decouples ingestion from delivery, absorbs traffic spikes, enables retries, and allows independent scaling of each channel. Make it clear that the queue is not an afterthought — it is the core architectural decision.
    • Rate limiting is a feature, not a footnote. Many candidates mention rate limiting in passing. Go deep: explain the algorithm (sliding window vs. token bucket), the granularity (per-user, per-channel, per-category), and the exceptions (P0 bypasses limits). This signals real-world maturity.
    • Address the failure path explicitly. Walk through what happens when a delivery attempt fails: retry with backoff, DLQ after exhaustion, reconciliation job for lost messages. Production systems are defined by how they handle failure, not how they handle the happy path.
    • Do not forget user preferences. A notification service that ignores user opt-outs is not just bad design — it violates regulations like GDPR and CAN-SPAM. Mention quiet hours and per-category controls to demonstrate you think about the user experience.

    Practice delivering this walkthrough in under 35 minutes, explaining each stage concisely while leaving room for interviewer follow-ups. Tools like Hoppers AI can help you rehearse with real-time feedback on structure, depth, and pacing — so you walk into the interview already knowing your weak spots.