Design a Notification Service — Complete System Design Walkthrough
Notification services are a staple of system design interviews because they touch on event-driven architecture, multi-channel delivery, rate limiting, and reliability at scale. Every major platform — from e-commerce to social media to fintech — relies on a notification system to keep users informed. In this walkthrough, we will design one from the ground up, following the structured 6-stage approach that top interviewers expect.
Stage 1: Requirements Gathering
Start by clarifying the scope. A notification service can mean many things — from a simple email sender to a full platform handling billions of events across multiple channels. Spend 3-5 minutes aligning with your interviewer on what matters most.
Functional Requirements
- Multi-channel delivery — Support four channels: push notifications (APNs/FCM), SMS, email, and in-app notifications.
- User preferences — Users can opt in or out of specific notification types per channel (e.g., receive marketing emails but not marketing push notifications).
- Template system — Notifications are rendered from reusable templates with dynamic variables (user name, order ID, etc.), not hardcoded strings.
- Rate limiting — Protect users from notification fatigue. Enforce per-user limits (e.g., no more than 5 push notifications per hour) and global limits per notification type.
- Priority levels — Support at least three priority tiers: P0 (critical — security alerts, OTP), P1 (transactional — order confirmations, shipping updates), P2 (promotional — marketing campaigns, recommendations).
- Delivery tracking — Track the status of every notification through its lifecycle: created, queued, sent, delivered, failed, read.
- Scheduling — Support sending notifications at a future time (e.g., "send this campaign at 9 AM in the user's local timezone").
Non-Functional Requirements
- Scale: 100 million registered users, 500 million notifications per day (~5,800 notifications/second average, ~20,000/second at peak).
- Latency: P0 notifications delivered within 5 seconds. P1 within 30 seconds. P2 best-effort within minutes.
- Availability: 99.99% — missed notifications damage user trust and can have financial or security implications.
- Reliability: At-least-once delivery guarantee. Duplicate delivery is preferable to missed delivery, but we should minimize duplicates.
- Extensibility: Adding a new channel (e.g., WhatsApp, Slack) should require implementing a worker, not redesigning the architecture.
Interview tip: Explicitly stating priority tiers and per-channel delivery SLAs demonstrates that you think about notification systems the way production teams do. Interviewers at companies like Google, Meta, and Amazon will immediately recognize this as a signal of real-world experience.
Stage 2: API Design
The notification service exposes APIs for producers (internal services that trigger notifications) and for end users (managing preferences and viewing notification history).
Producer APIs (Internal)
| Method | Endpoint | Purpose |
|---|---|---|
| POST | /v1/notifications/send | Send a single notification |
| POST | /v1/notifications/batch | Send to multiple recipients (campaigns) |
| GET | /v1/notifications/{id}/status | Query delivery status |
Send Notification Payload
{ "templateId": "order_shipped", "userId": "u_abc123", "channels": ["push", "email"], "priority": "P1", "data": { "orderId": "ORD-7891", "trackingUrl": "https://..." }, "scheduledAt": null, "idempotencyKey": "ship-ORD-7891" }
The idempotencyKey prevents duplicate notifications when producers retry after network failures. The service deduplicates by storing processed keys in a TTL-based cache (Redis) for 24 hours.
User-Facing APIs
| Method | Endpoint | Purpose |
|---|---|---|
| GET | /v1/users/{id}/notifications?cursor=&limit=20 | Paginated in-app notification feed |
| PUT | /v1/users/{id}/preferences | Update notification preferences per channel and type |
| GET | /v1/users/{id}/preferences | Retrieve current preferences |
| POST | /v1/users/{id}/notifications/{id}/read | Mark a notification as read |
Batch Send Payload
{ "templateId": "weekly_digest", "userIds": ["u_1", "u_2", ... ], "channels": ["email"], "priority": "P2", "data": { "weekStart": "2026-03-06" }, "scheduledAt": "2026-03-12T09:00:00Z" }
For large campaigns (millions of recipients), the batch endpoint accepts a segment ID instead of a user list. The Notification Service resolves the segment asynchronously by querying the user service.
Design decision: Why not let producers specify the notification body directly? Templates enforce brand consistency, prevent injection vulnerabilities, and allow non-engineers (product, marketing) to update copy without code changes. The producer sends structured data; the Notification Service handles rendering.
Stage 3: Data Model
We need storage for four distinct concerns: notification records, templates, user preferences, and delivery logs. Each has different access patterns and scale characteristics.
Notifications Table (Cassandra / DynamoDB)
| Column | Type | Role |
|---|---|---|
notification_id | UUID | Partition key |
user_id | UUID | Indexed (for feed queries) |
template_id | string | |
channels | list<string> | Target channels |
priority | enum (P0, P1, P2) | |
data | JSON | Template variables |
status | enum | created, queued, sent, failed |
created_at | timestamp | |
scheduled_at | timestamp (nullable) | |
idempotency_key | string | Deduplication |
For the in-app notification feed, we need efficient queries by user_id ordered by created_at. In DynamoDB, this means a GSI with PK=USER#{user_id} and SK=NOTIF#{created_at}#{notification_id}. In Cassandra, partition by user_id with clustering on created_at DESC.
Templates Table (PostgreSQL / DynamoDB)
| Column | Type |
|---|---|
template_id | string (PK) |
version | integer |
channel | enum |
subject | string (nullable, for email) |
body | text (with placeholders like {{userName}}) |
created_at | timestamp |
updated_at | timestamp |
Templates are versioned so that in-flight notifications always render with the template that was active when they were created, not a newer version that may have changed the messaging.
User Preferences Table
| Column | Type |
|---|---|
user_id | UUID (PK) |
channel | enum (push, sms, email, in_app) |
category | string (transactional, marketing, social, security) |
enabled | boolean |
quiet_hours_start | time (nullable) |
quiet_hours_end | time (nullable) |
timezone | string |
The composite key is (user_id, channel, category). This allows fine-grained control: a user can receive transactional push notifications but opt out of marketing pushes, while still receiving marketing emails.
Delivery Logs Table (Cassandra / S3 for archival)
| Column | Type |
|---|---|
notification_id | UUID |
channel | enum |
attempt | integer |
status | enum (sent, delivered, bounced, failed) |
provider_response | JSON |
attempted_at | timestamp |
Delivery logs are write-heavy and append-only — a natural fit for Cassandra or a time-series store. Older logs (beyond 30 days) can be archived to S3 in Parquet format for cost-efficient analytics.
Stage 4: High-Level Architecture
The architecture follows an event-driven, queue-based pattern that decouples notification ingestion from delivery. This is essential for handling bursty traffic (e.g., a flash sale generating millions of notifications simultaneously).
End-to-End Flow
- Producer sends request to the Notification Service via REST API. The service validates the payload, checks the idempotency key, and persists the notification record with status
created. - Preference check — The service queries user preferences to determine which channels are enabled for this notification category. If the user has opted out of all requested channels, the notification is marked
skippedand no further processing occurs. - Template rendering — The service fetches the template for each enabled channel and renders it with the provided data variables. Each channel may have a different template (push notifications are short; emails are rich HTML).
- Rate limit check — The service checks per-user rate limits using a sliding window counter in Redis. If the limit is exceeded, P2 notifications are deferred (re-queued with a delay); P0 and P1 notifications bypass rate limiting.
- Enqueue — The rendered notification is placed onto a priority queue (Kafka or SQS with separate queues per priority). P0 messages go to a dedicated high-priority topic with more consumers.
- Channel workers consume from the queue and dispatch to the appropriate external provider: APNs/FCM for push, Twilio/AWS SNS for SMS, SES/SendGrid for email, or write directly to the in-app feed store.
- Delivery tracking — Each worker writes a delivery log entry. For push and email, the worker processes webhook callbacks from providers (delivery receipts, bounces, complaints) and updates the notification status.
Component Responsibilities
| Component | Responsibility | Tech Choice |
|---|---|---|
| Notification Service | Validation, dedup, preferences, templating, rate limiting, routing | Stateless microservice (horizontally scalable) |
| Priority Queue | Decouple ingestion from delivery, priority-based consumption | Kafka (3 topics: P0, P1, P2) or SQS with separate queues |
| Channel Workers | Channel-specific delivery logic, provider SDK integration | Consumer groups, auto-scaled by queue depth |
| Provider Abstraction | Unified interface over multiple providers per channel | Strategy pattern (e.g., EmailProvider interface with SES and SendGrid implementations) |
| Delivery Tracker | Webhook ingestion, status updates, analytics | Separate service consuming provider callbacks |
| Scheduler | Hold scheduled notifications and release at the right time | DynamoDB TTL + Lambda trigger, or a delayed queue |
Provider Abstraction Layer
Each channel worker does not talk to a specific provider directly. Instead, it calls a Provider Abstraction Layer that exposes a uniform interface:
interface NotificationProvider { send(recipient: string, content: RenderedContent): DeliveryResult }
Behind this interface, we can have multiple implementations per channel. For email: SES (primary), SendGrid (fallback). For SMS: Twilio (primary), AWS SNS (fallback). This abstraction is critical for provider failover — if one provider's API is degraded, we route traffic to the backup without changing any worker code.
Stage 5: Deep Dive
We will go deep on two critical subsystems: per-user rate limiting and the retry/delivery guarantee mechanism.
Deep Dive 1: Rate Limiting Per User
Notification fatigue is real. Bombarding users with excessive notifications leads to app uninstalls, email unsubscribes, and brand damage. Rate limiting is not just a technical feature — it is a business requirement.
Sliding Window Counter (Redis)
We implement a sliding window log algorithm using Redis sorted sets. For each user and channel combination:
- Key:
ratelimit:{user_id}:{channel} - Members: Notification timestamps (as scores and values)
- Operations per notification:
ZREMRANGEBYSCORE— Remove entries older than the window (e.g., 1 hour ago).ZCARD— Count remaining entries. If count exceeds the limit (e.g., 5 for push), reject or defer.ZADD— Add the current timestamp if the notification is allowed.
- TTL: Set a TTL on the key equal to the window size to prevent memory leaks from inactive users.
All three operations are wrapped in a Lua script for atomicity:
EVAL "redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, ARGV[1]) local count = redis.call('ZCARD', KEYS[1]) if count < tonumber(ARGV[2]) then redis.call('ZADD', KEYS[1], ARGV[3], ARGV[3]) redis.call('EXPIRE', KEYS[1], ARGV[4]) return 1 end return 0" 1 ratelimit:{user_id}:push {window_start} {limit} {now} {ttl_seconds}
Tiered Rate Limits
| Priority | Push (per hour) | SMS (per day) | Email (per day) | In-App |
|---|---|---|---|---|
| P0 (Critical) | Unlimited | Unlimited | Unlimited | Unlimited |
| P1 (Transactional) | 20 | 10 | 30 | Unlimited |
| P2 (Promotional) | 5 | 3 | 5 | 20 |
P0 notifications (OTP codes, security alerts, fraud warnings) always bypass rate limits. You never want a user locked out of their account because they hit a rate limit on verification codes.
Quiet Hours
If the user has configured quiet hours (e.g., 10 PM to 8 AM), non-critical notifications are deferred to the next available window. The Notification Service checks the user's timezone and quiet hours during the preference check step and adjusts the scheduledAt accordingly. P0 notifications ignore quiet hours.
Deep Dive 2: Retry and Delivery Guarantee
Our target is at-least-once delivery. This means every notification must either be successfully delivered or exhaust its retry budget before being marked as permanently failed.
Retry Strategy: Exponential Backoff with Jitter
When a channel worker fails to deliver a notification (provider timeout, 5xx response, network error), it re-enqueues the message with a delay:
- Attempt 1: Immediate
- Attempt 2: 30 seconds + random jitter (0-10s)
- Attempt 3: 2 minutes + random jitter (0-30s)
- Attempt 4: 10 minutes + random jitter (0-60s)
- Attempt 5: 1 hour (final attempt)
The jitter prevents thundering herd problems when a provider recovers from an outage and thousands of retries fire simultaneously.
Dead Letter Queue
After exhausting all retry attempts, the notification moves to a Dead Letter Queue (DLQ). An operations team monitors the DLQ dashboard and can:
- Bulk retry — If the failure was due to a temporary provider outage, replay all DLQ messages.
- Route to alternate provider — Replay DLQ messages through the fallback provider.
- Mark as permanently failed — For invalid device tokens, unsubscribed phone numbers, or bounced email addresses.
Idempotency and Deduplication
At-least-once delivery means duplicates are possible. We mitigate this at two levels:
- Producer-level: The
idempotencyKeyin the send request prevents the same event from creating multiple notification records. The service stores processed keys in Redis with a 24-hour TTL. - Worker-level: Each delivery attempt is logged with a
(notification_id, channel, attempt)tuple. Before sending, the worker checks whether a successful delivery already exists for this notification and channel. If so, it skips the send.
Ensuring No Message Loss in the Queue
The queue itself must guarantee durability:
- Kafka: Use
acks=allwith replication factor 3. Messages are persisted to disk before acknowledgment. Consumer offsets are committed only after successful processing (at-least-once semantics). - SQS: Messages are redundantly stored across multiple AZs. Visibility timeout ensures a message is re-delivered if the consumer crashes before acknowledging it.
The Notification Service writes the notification to the database before enqueuing it. If the enqueue fails, a periodic reconciliation job scans for notifications in created status that are older than 5 minutes and re-enqueues them. This belt-and-suspenders approach ensures no notification is silently lost.
Stage 6: Scaling and Trade-Offs
Provider Failover
External providers are the biggest source of unreliability in a notification system. A provider outage should not mean failed notifications — it should mean automatic rerouting.
The Provider Abstraction Layer maintains a health score per provider, updated via a circuit breaker pattern:
- Closed (healthy): All traffic goes to the primary provider. If the error rate exceeds 10% over 60 seconds, trip the circuit.
- Open (unhealthy): All traffic routes to the fallback provider. After 5 minutes, transition to half-open.
- Half-open: Send 10% of traffic to the primary provider. If success rate exceeds 90%, close the circuit (return to primary).
Health scores are stored in Redis and shared across all worker instances so that failover decisions are coordinated. This prevents a split-brain scenario where some workers send to the unhealthy provider while others have already failed over.
Batching and Digest Mode
For high-volume notification types (e.g., social media "someone liked your post"), sending individual notifications for each event is wasteful and annoying. Instead, we aggregate events into digests:
- When a P2 notification arrives and the user already has unread notifications of the same type within the last hour, the service merges them: "3 people liked your post" instead of three separate notifications.
- For email, a digest job runs periodically (e.g., hourly or daily) and batches all pending notifications into a single email rendered from a digest template.
- Digest aggregation uses Redis sorted sets keyed by
digest:{user_id}:{category}. Individual events are added as members. The digest job reads the set, renders the template, sends the notification, and clears the set.
Cross-Region Delivery
With 100M users globally, a single-region architecture introduces latency for distant users and creates a single point of failure.
- Multi-region queues: Deploy Kafka clusters in each major region (US, EU, APAC). The Notification Service routes messages to the queue in the user's home region based on their profile data.
- Regional channel workers: Each region has its own pool of workers. This reduces latency to providers (APNs/FCM have regional endpoints) and keeps data closer to users for compliance (GDPR — EU user data processed in EU).
- Global notification store: Use DynamoDB Global Tables or Cassandra multi-datacenter replication for the notification feed. Users traveling across regions can still access their notification history with low latency.
- Cross-region event forwarding: If a notification must be delivered to a user in a different region from the producer, the Notification Service publishes to a cross-region event bus (EventBridge or Kafka MirrorMaker) rather than sending directly. The destination region's workers handle actual delivery.
Capacity Planning
At 500M notifications/day across four channels:
- Push: ~200M/day (40%). FCM and APNs can handle millions per second. The bottleneck is our worker throughput, not the provider. Budget 50 push workers at peak.
- Email: ~150M/day (30%). SES supports 50,000/second in production. Budget 30 email workers.
- In-app: ~120M/day (24%). Direct database writes, no external provider. Budget 20 workers.
- SMS: ~30M/day (6%). SMS is expensive ($0.01-0.05 per message) and slow (provider rate limits). Budget 20 SMS workers with careful rate management.
Worker auto-scaling is driven by queue depth metrics. When the P0 queue depth exceeds 1,000, scale up immediately. For P2, tolerate higher queue depths before scaling to control costs.
Trade-Off Summary
| Decision | Choice | Trade-Off |
|---|---|---|
| Delivery guarantee | At-least-once | Possible duplicates, but no missed notifications. Dedup logic mitigates most duplicates. |
| Queue architecture | Separate queues per priority | More infrastructure to manage, but P0 latency is never impacted by P2 volume spikes. |
| Rate limiting | Sliding window (Redis) | Adds ~1ms latency per notification. Redis becomes a dependency, but it is already in the critical path for dedup. |
| Template rendering | Server-side | Producers cannot customize delivery content, but this ensures brand consistency and prevents injection. |
| Provider abstraction | Circuit breaker failover | Adds complexity, but provider outages become transparent to the rest of the system. |
| Digest mode | Aggregation in Redis | Delayed delivery for low-priority notifications. Acceptable for P2; P0/P1 are never digested. |
Scoring Tips
To score well on a notification service design question, keep these principles in mind:
- Start with channels and priorities. This immediately shows you understand that not all notifications are equal. A security alert and a marketing email have fundamentally different SLAs, and your architecture should reflect that.
- Emphasize the provider abstraction. Interviewers love to ask "what happens when your SMS provider goes down?" If you have already described a circuit breaker with automatic failover, you have answered the question before it is asked.
- Show the queue is the backbone. The priority queue decouples ingestion from delivery, absorbs traffic spikes, enables retries, and allows independent scaling of each channel. Make it clear that the queue is not an afterthought — it is the core architectural decision.
- Rate limiting is a feature, not a footnote. Many candidates mention rate limiting in passing. Go deep: explain the algorithm (sliding window vs. token bucket), the granularity (per-user, per-channel, per-category), and the exceptions (P0 bypasses limits). This signals real-world maturity.
- Address the failure path explicitly. Walk through what happens when a delivery attempt fails: retry with backoff, DLQ after exhaustion, reconciliation job for lost messages. Production systems are defined by how they handle failure, not how they handle the happy path.
- Do not forget user preferences. A notification service that ignores user opt-outs is not just bad design — it violates regulations like GDPR and CAN-SPAM. Mention quiet hours and per-category controls to demonstrate you think about the user experience.
Practice delivering this walkthrough in under 35 minutes, explaining each stage concisely while leaving room for interviewer follow-ups. Tools like Hoppers AI can help you rehearse with real-time feedback on structure, depth, and pacing — so you walk into the interview already knowing your weak spots.