Design a Notification Service — System Design Interviews

Design a Notification Service — Complete System Design Walkthrough

Notification services are a staple of system design interviews because they touch on event-driven architecture, multi-channel delivery, rate limiting, and reliability at scale. Every major platform — from e-commerce to social media to fintech — relies on a notification system to keep users informed. In this walkthrough, we will design one from the ground up, following the structured 6-stage approach that top interviewers expect.

Stage 1: Requirements Gathering

Start by clarifying the scope. A notification service can mean many things — from a simple email sender to a full platform handling billions of events across multiple channels. Spend 3-5 minutes aligning with your interviewer on what matters most.

Functional Requirements

Multi-channel delivery — Support four channels: push notifications (APNs/FCM), SMS, email, and in-app notifications.
User preferences — Users can opt in or out of specific notification types per channel (e.g., receive marketing emails but not marketing push notifications).
Template system — Notifications are rendered from reusable templates with dynamic variables (user name, order ID, etc.), not hardcoded strings.
Rate limiting — Protect users from notification fatigue. Enforce per-user limits (e.g., no more than 5 push notifications per hour) and global limits per notification type.
Priority levels — Support at least three priority tiers: P0 (critical — security alerts, OTP), P1 (transactional — order confirmations, shipping updates), P2 (promotional — marketing campaigns, recommendations).
Delivery tracking — Track the status of every notification through its lifecycle: created, queued, sent, delivered, failed, read.
Scheduling — Support sending notifications at a future time (e.g., "send this campaign at 9 AM in the user's local timezone").

Non-Functional Requirements

Scale: 100 million registered users, 500 million notifications per day (~5,800 notifications/second average, ~20,000/second at peak).
Latency: P0 notifications delivered within 5 seconds. P1 within 30 seconds. P2 best-effort within minutes.
Availability: 99.99% — missed notifications damage user trust and can have financial or security implications.
Reliability: At-least-once delivery guarantee. Duplicate delivery is preferable to missed delivery, but we should minimize duplicates.
Extensibility: Adding a new channel (e.g., WhatsApp, Slack) should require implementing a worker, not redesigning the architecture.

Interview tip: Explicitly stating priority tiers and per-channel delivery SLAs demonstrates that you think about notification systems the way production teams do. Interviewers at companies like Google, Meta, and Amazon will immediately recognize this as a signal of real-world experience.

Stage 2: API Design

The notification service exposes APIs for producers (internal services that trigger notifications) and for end users (managing preferences and viewing notification history).

Producer APIs (Internal)

Method	Endpoint	Purpose
POST	`/v1/notifications/send`	Send a single notification
POST	`/v1/notifications/batch`	Send to multiple recipients (campaigns)
GET	`/v1/notifications/{id}/status`	Query delivery status

Send Notification Payload

{ "templateId": "order_shipped", "userId": "u_abc123", "channels": ["push", "email"], "priority": "P1", "data": { "orderId": "ORD-7891", "trackingUrl": "https://..." }, "scheduledAt": null, "idempotencyKey": "ship-ORD-7891" }

The idempotencyKey prevents duplicate notifications when producers retry after network failures. The service deduplicates by storing processed keys in a TTL-based cache (Redis) for 24 hours.

User-Facing APIs

Method	Endpoint	Purpose
GET	`/v1/users/{id}/notifications?cursor=&limit=20`	Paginated in-app notification feed
PUT	`/v1/users/{id}/preferences`	Update notification preferences per channel and type
GET	`/v1/users/{id}/preferences`	Retrieve current preferences
POST	`/v1/users/{id}/notifications/{id}/read`	Mark a notification as read

Batch Send Payload

{ "templateId": "weekly_digest", "userIds": ["u_1", "u_2", ... ], "channels": ["email"], "priority": "P2", "data": { "weekStart": "2026-03-06" }, "scheduledAt": "2026-03-12T09:00:00Z" }

For large campaigns (millions of recipients), the batch endpoint accepts a segment ID instead of a user list. The Notification Service resolves the segment asynchronously by querying the user service.

Design decision: Why not let producers specify the notification body directly? Templates enforce brand consistency, prevent injection vulnerabilities, and allow non-engineers (product, marketing) to update copy without code changes. The producer sends structured data; the Notification Service handles rendering.

Stage 3: Data Model

We need storage for four distinct concerns: notification records, templates, user preferences, and delivery logs. Each has different access patterns and scale characteristics.

Notifications Table (Cassandra / DynamoDB)

Column	Type	Role
`notification_id`	UUID	Partition key
`user_id`	UUID	Indexed (for feed queries)
`template_id`	string
`channels`	list<string>	Target channels
`priority`	enum (P0, P1, P2)
`data`	JSON	Template variables
`status`	enum	created, queued, sent, failed
`created_at`	timestamp
`scheduled_at`	timestamp (nullable)
`idempotency_key`	string	Deduplication

For the in-app notification feed, we need efficient queries by user_id ordered by created_at. In DynamoDB, this means a GSI with PK=USER#{user_id} and SK=NOTIF#{created_at}#{notification_id}. In Cassandra, partition by user_id with clustering on created_at DESC.

Templates Table (PostgreSQL / DynamoDB)

Column	Type
`template_id`	string (PK)
`version`	integer
`channel`	enum
`subject`	string (nullable, for email)
`body`	text (with placeholders like `{{userName}}`)
`created_at`	timestamp
`updated_at`	timestamp

Templates are versioned so that in-flight notifications always render with the template that was active when they were created, not a newer version that may have changed the messaging.

User Preferences Table

Column	Type
`user_id`	UUID (PK)
`channel`	enum (push, sms, email, in_app)
`category`	string (transactional, marketing, social, security)
`enabled`	boolean
`quiet_hours_start`	time (nullable)
`quiet_hours_end`	time (nullable)
`timezone`	string

The composite key is (user_id, channel, category). This allows fine-grained control: a user can receive transactional push notifications but opt out of marketing pushes, while still receiving marketing emails.

Delivery Logs Table (Cassandra / S3 for archival)

Column	Type
`notification_id`	UUID
`channel`	enum
`attempt`	integer
`status`	enum (sent, delivered, bounced, failed)
`provider_response`	JSON
`attempted_at`	timestamp

Delivery logs are write-heavy and append-only — a natural fit for Cassandra or a time-series store. Older logs (beyond 30 days) can be archived to S3 in Parquet format for cost-efficient analytics.

Stage 4: High-Level Architecture

The architecture follows an event-driven, queue-based pattern that decouples notification ingestion from delivery. This is essential for handling bursty traffic (e.g., a flash sale generating millions of notifications simultaneously).

End-to-End Flow

Producer sends request to the Notification Service via REST API. The service validates the payload, checks the idempotency key, and persists the notification record with status created.
Preference check — The service queries user preferences to determine which channels are enabled for this notification category. If the user has opted out of all requested channels, the notification is marked skipped and no further processing occurs.
Template rendering — The service fetches the template for each enabled channel and renders it with the provided data variables. Each channel may have a different template (push notifications are short; emails are rich HTML).
Rate limit check — The service checks per-user rate limits using a sliding window counter in Redis. If the limit is exceeded, P2 notifications are deferred (re-queued with a delay); P0 and P1 notifications bypass rate limiting.
Enqueue — The rendered notification is placed onto a priority queue (Kafka or SQS with separate queues per priority). P0 messages go to a dedicated high-priority topic with more consumers.
Channel workers consume from the queue and dispatch to the appropriate external provider: APNs/FCM for push, Twilio/AWS SNS for SMS, SES/SendGrid for email, or write directly to the in-app feed store.
Delivery tracking — Each worker writes a delivery log entry. For push and email, the worker processes webhook callbacks from providers (delivery receipts, bounces, complaints) and updates the notification status.

Component Responsibilities

Component	Responsibility	Tech Choice
Notification Service	Validation, dedup, preferences, templating, rate limiting, routing	Stateless microservice (horizontally scalable)
Priority Queue	Decouple ingestion from delivery, priority-based consumption	Kafka (3 topics: P0, P1, P2) or SQS with separate queues
Channel Workers	Channel-specific delivery logic, provider SDK integration	Consumer groups, auto-scaled by queue depth
Provider Abstraction	Unified interface over multiple providers per channel	Strategy pattern (e.g., EmailProvider interface with SES and SendGrid implementations)
Delivery Tracker	Webhook ingestion, status updates, analytics	Separate service consuming provider callbacks
Scheduler	Hold scheduled notifications and release at the right time	DynamoDB TTL + Lambda trigger, or a delayed queue

Provider Abstraction Layer

Each channel worker does not talk to a specific provider directly. Instead, it calls a Provider Abstraction Layer that exposes a uniform interface:

interface NotificationProvider { send(recipient: string, content: RenderedContent): DeliveryResult }

Behind this interface, we can have multiple implementations per channel. For email: SES (primary), SendGrid (fallback). For SMS: Twilio (primary), AWS SNS (fallback). This abstraction is critical for provider failover — if one provider's API is degraded, we route traffic to the backup without changing any worker code.

Stage 5: Deep Dive

We will go deep on two critical subsystems: per-user rate limiting and the retry/delivery guarantee mechanism.

Deep Dive 1: Rate Limiting Per User

Notification fatigue is real. Bombarding users with excessive notifications leads to app uninstalls, email unsubscribes, and brand damage. Rate limiting is not just a technical feature — it is a business requirement.

Sliding Window Counter (Redis)

We implement a sliding window log algorithm using Redis sorted sets. For each user and channel combination:

Key: ratelimit:{user_id}:{channel}
Members: Notification timestamps (as scores and values)
Operations per notification:
1. ZREMRANGEBYSCORE — Remove entries older than the window (e.g., 1 hour ago).
2. ZCARD — Count remaining entries. If count exceeds the limit (e.g., 5 for push), reject or defer.
3. ZADD — Add the current timestamp if the notification is allowed.
TTL: Set a TTL on the key equal to the window size to prevent memory leaks from inactive users.

All three operations are wrapped in a Lua script for atomicity:

EVAL "redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, ARGV[1]) local count = redis.call('ZCARD', KEYS[1]) if count < tonumber(ARGV[2]) then redis.call('ZADD', KEYS[1], ARGV[3], ARGV[3]) redis.call('EXPIRE', KEYS[1], ARGV[4]) return 1 end return 0" 1 ratelimit:{user_id}:push {window_start} {limit} {now} {ttl_seconds}

Tiered Rate Limits

Priority	Push (per hour)	SMS (per day)	Email (per day)	In-App
P0 (Critical)	Unlimited	Unlimited	Unlimited	Unlimited
P1 (Transactional)	20	10	30	Unlimited
P2 (Promotional)	5	3	5	20

P0 notifications (OTP codes, security alerts, fraud warnings) always bypass rate limits. You never want a user locked out of their account because they hit a rate limit on verification codes.

Quiet Hours

If the user has configured quiet hours (e.g., 10 PM to 8 AM), non-critical notifications are deferred to the next available window. The Notification Service checks the user's timezone and quiet hours during the preference check step and adjusts the scheduledAt accordingly. P0 notifications ignore quiet hours.

Deep Dive 2: Retry and Delivery Guarantee

Our target is at-least-once delivery. This means every notification must either be successfully delivered or exhaust its retry budget before being marked as permanently failed.

Retry Strategy: Exponential Backoff with Jitter

When a channel worker fails to deliver a notification (provider timeout, 5xx response, network error), it re-enqueues the message with a delay:

Attempt 1: Immediate
Attempt 2: 30 seconds + random jitter (0-10s)
Attempt 3: 2 minutes + random jitter (0-30s)
Attempt 4: 10 minutes + random jitter (0-60s)
Attempt 5: 1 hour (final attempt)

The jitter prevents thundering herd problems when a provider recovers from an outage and thousands of retries fire simultaneously.

Dead Letter Queue

After exhausting all retry attempts, the notification moves to a Dead Letter Queue (DLQ). An operations team monitors the DLQ dashboard and can:

Bulk retry — If the failure was due to a temporary provider outage, replay all DLQ messages.
Route to alternate provider — Replay DLQ messages through the fallback provider.
Mark as permanently failed — For invalid device tokens, unsubscribed phone numbers, or bounced email addresses.

Idempotency and Deduplication

At-least-once delivery means duplicates are possible. We mitigate this at two levels:

Producer-level: The idempotencyKey in the send request prevents the same event from creating multiple notification records. The service stores processed keys in Redis with a 24-hour TTL.
Worker-level: Each delivery attempt is logged with a (notification_id, channel, attempt) tuple. Before sending, the worker checks whether a successful delivery already exists for this notification and channel. If so, it skips the send.

Ensuring No Message Loss in the Queue

The queue itself must guarantee durability:

Kafka: Use acks=all with replication factor 3. Messages are persisted to disk before acknowledgment. Consumer offsets are committed only after successful processing (at-least-once semantics).
SQS: Messages are redundantly stored across multiple AZs. Visibility timeout ensures a message is re-delivered if the consumer crashes before acknowledging it.

The Notification Service writes the notification to the database before enqueuing it. If the enqueue fails, a periodic reconciliation job scans for notifications in created status that are older than 5 minutes and re-enqueues them. This belt-and-suspenders approach ensures no notification is silently lost.

Stage 6: Scaling and Trade-Offs

Provider Failover

External providers are the biggest source of unreliability in a notification system. A provider outage should not mean failed notifications — it should mean automatic rerouting.

The Provider Abstraction Layer maintains a health score per provider, updated via a circuit breaker pattern:

Closed (healthy): All traffic goes to the primary provider. If the error rate exceeds 10% over 60 seconds, trip the circuit.
Open (unhealthy): All traffic routes to the fallback provider. After 5 minutes, transition to half-open.
Half-open: Send 10% of traffic to the primary provider. If success rate exceeds 90%, close the circuit (return to primary).

Health scores are stored in Redis and shared across all worker instances so that failover decisions are coordinated. This prevents a split-brain scenario where some workers send to the unhealthy provider while others have already failed over.

Batching and Digest Mode

For high-volume notification types (e.g., social media "someone liked your post"), sending individual notifications for each event is wasteful and annoying. Instead, we aggregate events into digests:

When a P2 notification arrives and the user already has unread notifications of the same type within the last hour, the service merges them: "3 people liked your post" instead of three separate notifications.
For email, a digest job runs periodically (e.g., hourly or daily) and batches all pending notifications into a single email rendered from a digest template.
Digest aggregation uses Redis sorted sets keyed by digest:{user_id}:{category}. Individual events are added as members. The digest job reads the set, renders the template, sends the notification, and clears the set.

Cross-Region Delivery

With 100M users globally, a single-region architecture introduces latency for distant users and creates a single point of failure.

Multi-region queues: Deploy Kafka clusters in each major region (US, EU, APAC). The Notification Service routes messages to the queue in the user's home region based on their profile data.
Regional channel workers: Each region has its own pool of workers. This reduces latency to providers (APNs/FCM have regional endpoints) and keeps data closer to users for compliance (GDPR — EU user data processed in EU).
Global notification store: Use DynamoDB Global Tables or Cassandra multi-datacenter replication for the notification feed. Users traveling across regions can still access their notification history with low latency.
Cross-region event forwarding: If a notification must be delivered to a user in a different region from the producer, the Notification Service publishes to a cross-region event bus (EventBridge or Kafka MirrorMaker) rather than sending directly. The destination region's workers handle actual delivery.

Capacity Planning

At 500M notifications/day across four channels:

Push: ~200M/day (40%). FCM and APNs can handle millions per second. The bottleneck is our worker throughput, not the provider. Budget 50 push workers at peak.
Email: ~150M/day (30%). SES supports 50,000/second in production. Budget 30 email workers.
In-app: ~120M/day (24%). Direct database writes, no external provider. Budget 20 workers.
SMS: ~30M/day (6%). SMS is expensive ($0.01-0.05 per message) and slow (provider rate limits). Budget 20 SMS workers with careful rate management.

Worker auto-scaling is driven by queue depth metrics. When the P0 queue depth exceeds 1,000, scale up immediately. For P2, tolerate higher queue depths before scaling to control costs.

Trade-Off Summary

Decision	Choice	Trade-Off
Delivery guarantee	At-least-once	Possible duplicates, but no missed notifications. Dedup logic mitigates most duplicates.
Queue architecture	Separate queues per priority	More infrastructure to manage, but P0 latency is never impacted by P2 volume spikes.
Rate limiting	Sliding window (Redis)	Adds ~1ms latency per notification. Redis becomes a dependency, but it is already in the critical path for dedup.
Template rendering	Server-side	Producers cannot customize delivery content, but this ensures brand consistency and prevents injection.
Provider abstraction	Circuit breaker failover	Adds complexity, but provider outages become transparent to the rest of the system.
Digest mode	Aggregation in Redis	Delayed delivery for low-priority notifications. Acceptable for P2; P0/P1 are never digested.

Scoring Tips

To score well on a notification service design question, keep these principles in mind:

Start with channels and priorities. This immediately shows you understand that not all notifications are equal. A security alert and a marketing email have fundamentally different SLAs, and your architecture should reflect that.
Emphasize the provider abstraction. Interviewers love to ask "what happens when your SMS provider goes down?" If you have already described a circuit breaker with automatic failover, you have answered the question before it is asked.
Show the queue is the backbone. The priority queue decouples ingestion from delivery, absorbs traffic spikes, enables retries, and allows independent scaling of each channel. Make it clear that the queue is not an afterthought — it is the core architectural decision.
Rate limiting is a feature, not a footnote. Many candidates mention rate limiting in passing. Go deep: explain the algorithm (sliding window vs. token bucket), the granularity (per-user, per-channel, per-category), and the exceptions (P0 bypasses limits). This signals real-world maturity.
Address the failure path explicitly. Walk through what happens when a delivery attempt fails: retry with backoff, DLQ after exhaustion, reconciliation job for lost messages. Production systems are defined by how they handle failure, not how they handle the happy path.
Do not forget user preferences. A notification service that ignores user opt-outs is not just bad design — it violates regulations like GDPR and CAN-SPAM. Mention quiet hours and per-category controls to demonstrate you think about the user experience.

Practice delivering this walkthrough in under 35 minutes, explaining each stage concisely while leaving room for interviewer follow-ups. Tools like Hoppers AI can help you rehearse with real-time feedback on structure, depth, and pacing — so you walk into the interview already knowing your weak spots.