Design a URL Shortener — System Design Interviews

Design a URL Shortener — Complete System Design Walkthrough

The URL shortener is one of the most frequently asked system design questions at top-tier companies — appearing in interviews at Google, Meta, Amazon, and countless startups. It appears deceptively simple — take a long URL, return a short one — but a strong answer demands careful reasoning about encoding schemes, read-heavy traffic patterns, caching hierarchies, and analytics pipelines. The question tests whether you can navigate trade-offs across storage, computation, and networking under realistic constraints.

This guide walks through the problem in six stages, mirroring the structure interviewers expect. Each stage builds on the previous one, and we call out the exact moments where candidates typically score points or lose them.

1. Requirements

Start every system design interview by clarifying requirements. Do not jump into architecture. Spend 3-5 minutes here — it signals maturity and prevents you from solving the wrong problem.

Functional Requirements

Shorten: Given a long URL, generate a unique short URL (e.g., https://short.ly/aB3kQ7).
Redirect: When a user visits a short URL, redirect them to the original long URL via HTTP 301 or 302.
Custom aliases: Allow users to optionally specify a custom short code (e.g., short.ly/my-brand).
Expiration: Support optional TTL on short URLs. Default lifetime: 5 years.
Analytics: Track click counts, referrers, geolocation, and timestamps per short URL.

Non-Functional Requirements

Scale: 100 million new URLs created per month (~40 URLs/second write). Read-to-write ratio of 100:1 means ~4,000 redirects/second.
Latency: Redirect latency under 50ms at p99.
Availability: 99.99% uptime — a redirect failure is a broken link on the internet.
Durability: Once created, a short URL must never lose its mapping.
URL length: Short codes should be 7 characters or fewer.

Why these numbers matter: Stating concrete figures shows the interviewer you can reason about capacity. 100M URLs/month over 5 years = 6 billion records. At ~500 bytes per record, that is roughly 3 TB of mapping data — comfortably within a single well-partitioned database.

Back-of-Envelope Estimates

Metric	Value
New URLs/month	100M
Write QPS (avg)	~40
Read QPS (avg)	~4,000
Peak read QPS (5x)	~20,000
Storage (5 years)	~3 TB
Short code keyspace (base-62, 7 chars)	3.5 trillion

2. API Design

Three endpoints cover the core surface area. Keep the API RESTful and versioned. A common mistake is designing too many endpoints — resist the urge to add a PUT for updating URLs or a GET for listing all URLs until the interviewer asks for it. Start minimal and expand.

POST /api/v1/urls — Create Short URL

Request:

{
  "longUrl": "https://example.com/very/long/path?query=value",
  "customAlias": "my-brand",   // optional
  "expiresAt": "2027-03-12T00:00:00Z"  // optional
}

Response (201 Created):

{
  "shortCode": "my-brand",
  "shortUrl": "https://short.ly/my-brand",
  "longUrl": "https://example.com/very/long/path?query=value",
  "expiresAt": "2027-03-12T00:00:00Z",
  "createdAt": "2026-03-12T10:30:00Z"
}

GET /{shortCode} — Redirect

Returns HTTP 302 Found with Location: {longUrl}. We prefer 302 over 301 because 301 causes browsers to cache the redirect permanently, which would prevent us from collecting analytics and honoring expiration changes. If analytics are not required and you want maximum CDN cacheability, 301 is the better choice — state your reasoning explicitly.

GET /api/v1/urls/{shortCode}/stats — Analytics

Response:

{
  "shortCode": "aB3kQ7",
  "totalClicks": 284910,
  "clicksByDay": [...],
  "topReferrers": [...],
  "topCountries": [...]
}

3. Data Model

URL Mapping Table

Column	Type	Notes
`short_code`	VARCHAR(7), PK	Base-62 encoded or custom alias
`long_url`	TEXT	Original URL, indexed for dedup
`user_id`	VARCHAR(36)	Owner (nullable for anonymous)
`created_at`	TIMESTAMP	Creation time
`expires_at`	TIMESTAMP	Nullable; default = created_at + 5 years

Click Events Table (Analytics)

Column	Type	Notes
`event_id`	UUID	Partition key
`short_code`	VARCHAR(7)	FK to URL mapping
`timestamp`	TIMESTAMP	Click time
`referrer`	TEXT	HTTP Referer header
`country`	VARCHAR(2)	Derived from IP via GeoIP
`user_agent`	TEXT	Browser/device info

Storage Choice

URL mappings: Use a relational database like PostgreSQL or a key-value store like DynamoDB. The access pattern is simple: point lookups by short_code. A key-value store is a natural fit, but PostgreSQL works well too and gives you transactional guarantees for deduplication. Either choice is defensible — justify it.

If you choose PostgreSQL, add a unique index on long_url for deduplication (optional — discuss whether the same long URL should always produce the same short code, or whether each creation should be independent). If you choose DynamoDB, short_code is the partition key with no sort key needed — a clean single-item lookup pattern.

Click events: This is append-heavy, time-series data. Use a columnar store like Apache Cassandra or stream events into Kafka and sink to a warehouse (BigQuery, Redshift). Do not write click events synchronously in the redirect path — it would add latency to every redirect. The analytics store is a completely separate concern from the URL mapping store, and should be discussed as such.

4. High-Level Architecture

The architecture separates the read path (redirect) from the write path (create) and isolates the analytics pipeline entirely. This separation is critical — the redirect path is latency-sensitive and high-throughput, while the create path is low-throughput but requires uniqueness guarantees. The analytics pipeline is fire-and-forget.

Write Path (Create Short URL)

Client sends POST /api/v1/urls with the long URL.
The request passes through the CDN (not cached) and load balancer to a URL Service instance.
The service generates a unique 7-character short code (see Deep Dive below).
The service writes the (short_code, long_url, metadata) mapping to the database.
The service returns the short URL to the client.

Read Path (Redirect)

Client visits https://short.ly/aB3kQ7.
The CDN checks its edge cache. On a hit, it returns the 302 redirect immediately — sub-10ms latency.
On a cache miss, the request hits the load balancer and reaches a URL Service instance.
The service checks Redis cache first. On a hit, it returns the redirect.
On a cache miss, it queries the database, populates the cache, and returns the redirect.
Asynchronously, the service emits a click event to Kafka for analytics processing.

The read path is optimized for the 100:1 read-to-write ratio. Most redirects are served from cache (CDN or Redis) and never touch the database.

5. Deep Dive

5a. Short Code Generation

This is the heart of the problem. You need a scheme that produces short, unique, collision-free codes at scale. There are three main approaches:

Option 1: Hash + Truncate

Compute MD5 or SHA-256 of the long URL, then take the first 7 characters of its base-62 encoding.

Pro: Deterministic — the same long URL always produces the same short code (natural deduplication).
Con: Collisions. With a 7-character base-62 space (~3.5 trillion), birthday-paradox collisions become likely around ~1.8 million entries. You must handle collisions — retry with a salt or append a counter.

Option 2: Counter-Based (Recommended)

Use a globally unique auto-incrementing counter, then encode the counter value in base-62.

Pro: Zero collisions by construction. Simple and fast.
Con: Sequential codes are predictable (users can enumerate). A single counter is a bottleneck.

Solution: Use a distributed counter service. Pre-allocate ranges of IDs to each application server. For example, Server A gets range [1M, 2M), Server B gets [2M, 3M). Each server increments locally with zero coordination until its range is exhausted, then requests a new range. This is the approach used by systems like Twitter Snowflake and Instagram.

To avoid predictability, apply a bijective shuffle (e.g., multiply by a large coprime modulo 3.5 trillion) before base-62 encoding. The result looks random but is still collision-free. For example, if your counter produces the value 42, and your coprime constant is C, then the shuffled value is (42 * C) mod 3.5T. Because C is coprime to the modulus, this mapping is bijective — every input maps to a unique output, and vice versa. The short codes appear random to external observers while requiring zero collision handling internally.

Option 3: Pre-Generated Key Store

Generate millions of random 7-character codes offline and store them in a key database. When a URL is created, pop a code from the unused pool.

Pro: No collision logic at runtime. Codes are unpredictable.
Con: Requires managing the key pool — concurrency control to avoid two servers taking the same key. Typically done with a two-table approach: unused_keys and used_keys.

Interview tip: Present all three options briefly, then commit to one and explain why. The counter-based approach with range allocation is the strongest choice — it is simple, collision-free, and horizontally scalable. Mention the bijective shuffle to address predictability.

5b. Caching Strategy

With 4,000+ reads/second and a 50ms p99 target, caching is not optional — it is the core of the read path.

Two-Layer Cache

Layer 1 — CDN (CloudFront, Cloudflare): Cache 302 redirect responses at the edge. Set Cache-Control: public, max-age=3600 for non-expiring URLs. This absorbs the vast majority of traffic for popular short URLs. A viral link might receive millions of clicks — all served from the edge.

Layer 2 — Redis Cluster: Application-level cache in front of the database. Store short_code → long_url mappings with a TTL slightly shorter than the URL expiration. Use a cluster of Redis nodes with consistent hashing for even distribution.

Cache Eviction and Invalidation

TTL-based expiration: Each cache entry expires after 1 hour (CDN) or 24 hours (Redis). This naturally handles URL expiration without explicit invalidation.
Write-through on create: When a new short URL is created, populate the Redis cache immediately. The first redirect will be a cache hit.
Deletion/expiration: When a URL is deleted or expires, purge it from Redis and issue a CDN invalidation. This is the rare case — optimize for the common path.

Cache Hit Rate Estimation

URL access follows a Zipf distribution — a small percentage of URLs receive the vast majority of clicks. In practice, the top 20% of URLs account for 80%+ of redirects. A Redis cluster holding 50 million entries (~5 GB at 100 bytes per entry) can achieve a 90%+ hit rate. With the CDN layer on top, fewer than 5% of requests reach the database.

Layer	Hit Rate	Latency
CDN edge	~60-70%	<10ms
Redis cache	~25-30%	<5ms
Database	~5%	<20ms

6. Scaling and Trade-offs

Bottleneck Analysis

Database writes: At 40 writes/second average, a single PostgreSQL instance handles this easily. Even at 10x peak (400/s), this is not a bottleneck. If you choose DynamoDB, writes scale horizontally with on-demand capacity.

Database reads: Without caching, 4,000 reads/second is manageable for PostgreSQL with read replicas. But with the two-layer cache, the database sees fewer than 200 reads/second — trivial.

Short code generation: The counter-based approach with range allocation is horizontally scalable. Each server generates codes independently. The coordination service (ZooKeeper, etcd, or a simple database table) is only contacted when a server exhausts its range — a rare event.

Analytics pipeline: This is the actual scaling challenge. At 4,000 clicks/second, the click event stream produces ~350 million events per day. Kafka handles this throughput comfortably. Downstream consumers aggregate events into per-minute, per-hour, and per-day rollups stored in a columnar database.

CAP Theorem Considerations

A URL shortener must prioritize availability over consistency. A failed redirect is a broken link — unacceptable. If a URL was just created and is not yet replicated to all nodes, a brief delay is tolerable (eventual consistency). This makes an AP system the right choice.

In practice, you need a nuanced position — not a blanket choice of AP or CP:

Use eventual consistency for read replicas. A few seconds of replication lag is acceptable — a brand-new URL rarely receives clicks in its first second of existence.
Use strong consistency for the uniqueness check during short code creation (write path). This prevents duplicate codes. If you use the counter-based approach with range pre-allocation, this strong consistency requirement is limited to the range coordination service, not every write.
For the analytics pipeline, eventual consistency is more than acceptable — analytics data can be minutes behind real-time without any user impact.

Reliability and Fault Tolerance

Database: Multi-AZ deployment with automated failover. Daily backups with point-in-time recovery.
Redis: Redis Sentinel or Cluster mode with automatic failover. If the entire cache layer fails, the system degrades gracefully — requests fall through to the database, increasing latency but not causing errors.
Kafka: Replication factor of 3 across brokers. If click events are lost, analytics are slightly inaccurate — an acceptable trade-off compared to losing URL mappings.
Rate limiting: Apply per-IP and per-user rate limits on the create endpoint to prevent abuse. A simple token bucket at the load balancer layer works.

Monitoring and Observability

Key metrics: Redirect latency (p50, p95, p99), cache hit rates (CDN + Redis), error rates (4xx, 5xx), Kafka consumer lag, database connection pool utilization.
Alerting: Page on p99 redirect latency exceeding 100ms, error rate above 0.1%, or Kafka consumer lag exceeding 10,000 events.
Dashboards: Real-time redirect QPS, top-N short URLs by click volume, new URLs created per hour.

Scoring Tips

Interviewers evaluate system design answers across four dimensions. Here is how to maximize your score on the URL shortener problem:

Dimension	What Strong Looks Like
Requirements Clarity	State concrete numbers (100M URLs/month, 100:1 read-write ratio) before jumping into design. Clarify 301 vs 302 trade-off proactively.
Architecture Quality	Draw the full read and write paths. Show the two-layer cache. Separate the analytics pipeline from the redirect path — never add latency to the hot path.
Technical Depth	Go deep on short code generation. Present multiple options, pick one, and justify the choice. Discuss the bijective shuffle for unpredictability. Know the birthday paradox numbers.
Communication	Lead the conversation. State your approach before drawing. Proactively discuss trade-offs — do not wait for the interviewer to poke holes.

Common pitfalls to avoid: (1) Using MD5 without discussing collision handling. (2) Caching with 301 redirects and then claiming you can track analytics. (3) Writing click events synchronously in the redirect path. (4) Ignoring the counter bottleneck in a distributed deployment. (5) Not discussing what happens when a URL expires. (6) Over-engineering the write path when the system is read-heavy. (7) Forgetting to mention rate limiting on the create endpoint — without it, an attacker can exhaust your keyspace.

Time Management in the Interview

A typical system design round is 45 minutes, of which 35-40 are active design time. Here is how to allocate it for this problem:

Stage	Time	Notes
Requirements	3-5 min	Clarify scope, state numbers
API Design	3-4 min	3 endpoints, request/response
Data Model	3-4 min	Schema + storage justification
High-Level Architecture	8-10 min	Draw diagram, walk through both paths
Deep Dive	10-12 min	Go deep on 1-2 topics (interviewer-led)
Scaling	5-7 min	Bottlenecks, CAP, monitoring

The URL shortener question rewards breadth and depth in equal measure. Cover all six stages, go deep where the interviewer shows interest, and always articulate the trade-offs behind your decisions. Practice walking through this design end-to-end in 35 minutes — that is the typical time budget in a real interview. The strongest candidates finish each stage crisply and leave the interviewer with nothing to poke at — or better yet, proactively surface the issues the interviewer was about to raise.

For hands-on practice with real-time AI feedback on your system design delivery, explore mock interviews on Hoppers AI — including dedicated system design sessions that evaluate your requirements gathering, architecture, and communication in real time.