We use cookies for analytics to improve our service. See our Privacy Policy.

    Sign up free to unlock interview prep materials and a free mock interview for your next role.

    Start Free
    system design
    question breakdown
    URL shortener

    Design a URL Shortener

    Hoppers AI Team·March 12, 2026·12 min read

    Design a URL Shortener — Complete System Design Walkthrough

    The URL shortener is one of the most frequently asked system design questions at top-tier companies — appearing in interviews at Google, Meta, Amazon, and countless startups. It appears deceptively simple — take a long URL, return a short one — but a strong answer demands careful reasoning about encoding schemes, read-heavy traffic patterns, caching hierarchies, and analytics pipelines. The question tests whether you can navigate trade-offs across storage, computation, and networking under realistic constraints.

    This guide walks through the problem in six stages, mirroring the structure interviewers expect. Each stage builds on the previous one, and we call out the exact moments where candidates typically score points or lose them.

    1. Requirements

    Start every system design interview by clarifying requirements. Do not jump into architecture. Spend 3-5 minutes here — it signals maturity and prevents you from solving the wrong problem.

    Functional Requirements

    • Shorten: Given a long URL, generate a unique short URL (e.g., https://short.ly/aB3kQ7).
    • Redirect: When a user visits a short URL, redirect them to the original long URL via HTTP 301 or 302.
    • Custom aliases: Allow users to optionally specify a custom short code (e.g., short.ly/my-brand).
    • Expiration: Support optional TTL on short URLs. Default lifetime: 5 years.
    • Analytics: Track click counts, referrers, geolocation, and timestamps per short URL.

    Non-Functional Requirements

    • Scale: 100 million new URLs created per month (~40 URLs/second write). Read-to-write ratio of 100:1 means ~4,000 redirects/second.
    • Latency: Redirect latency under 50ms at p99.
    • Availability: 99.99% uptime — a redirect failure is a broken link on the internet.
    • Durability: Once created, a short URL must never lose its mapping.
    • URL length: Short codes should be 7 characters or fewer.
    Why these numbers matter: Stating concrete figures shows the interviewer you can reason about capacity. 100M URLs/month over 5 years = 6 billion records. At ~500 bytes per record, that is roughly 3 TB of mapping data — comfortably within a single well-partitioned database.

    Back-of-Envelope Estimates

    MetricValue
    New URLs/month100M
    Write QPS (avg)~40
    Read QPS (avg)~4,000
    Peak read QPS (5x)~20,000
    Storage (5 years)~3 TB
    Short code keyspace (base-62, 7 chars)3.5 trillion

    2. API Design

    Three endpoints cover the core surface area. Keep the API RESTful and versioned. A common mistake is designing too many endpoints — resist the urge to add a PUT for updating URLs or a GET for listing all URLs until the interviewer asks for it. Start minimal and expand.

    POST /api/v1/urls — Create Short URL

    Request:

    {
      "longUrl": "https://example.com/very/long/path?query=value",
      "customAlias": "my-brand",   // optional
      "expiresAt": "2027-03-12T00:00:00Z"  // optional
    }

    Response (201 Created):

    {
      "shortCode": "my-brand",
      "shortUrl": "https://short.ly/my-brand",
      "longUrl": "https://example.com/very/long/path?query=value",
      "expiresAt": "2027-03-12T00:00:00Z",
      "createdAt": "2026-03-12T10:30:00Z"
    }

    GET /{shortCode} — Redirect

    Returns HTTP 302 Found with Location: {longUrl}. We prefer 302 over 301 because 301 causes browsers to cache the redirect permanently, which would prevent us from collecting analytics and honoring expiration changes. If analytics are not required and you want maximum CDN cacheability, 301 is the better choice — state your reasoning explicitly.

    GET /api/v1/urls/{shortCode}/stats — Analytics

    Response:

    {
      "shortCode": "aB3kQ7",
      "totalClicks": 284910,
      "clicksByDay": [...],
      "topReferrers": [...],
      "topCountries": [...]
    }

    3. Data Model

    URL Mapping Table

    ColumnTypeNotes
    short_codeVARCHAR(7), PKBase-62 encoded or custom alias
    long_urlTEXTOriginal URL, indexed for dedup
    user_idVARCHAR(36)Owner (nullable for anonymous)
    created_atTIMESTAMPCreation time
    expires_atTIMESTAMPNullable; default = created_at + 5 years

    Click Events Table (Analytics)

    ColumnTypeNotes
    event_idUUIDPartition key
    short_codeVARCHAR(7)FK to URL mapping
    timestampTIMESTAMPClick time
    referrerTEXTHTTP Referer header
    countryVARCHAR(2)Derived from IP via GeoIP
    user_agentTEXTBrowser/device info

    Storage Choice

    URL mappings: Use a relational database like PostgreSQL or a key-value store like DynamoDB. The access pattern is simple: point lookups by short_code. A key-value store is a natural fit, but PostgreSQL works well too and gives you transactional guarantees for deduplication. Either choice is defensible — justify it.

    If you choose PostgreSQL, add a unique index on long_url for deduplication (optional — discuss whether the same long URL should always produce the same short code, or whether each creation should be independent). If you choose DynamoDB, short_code is the partition key with no sort key needed — a clean single-item lookup pattern.

    Click events: This is append-heavy, time-series data. Use a columnar store like Apache Cassandra or stream events into Kafka and sink to a warehouse (BigQuery, Redshift). Do not write click events synchronously in the redirect path — it would add latency to every redirect. The analytics store is a completely separate concern from the URL mapping store, and should be discussed as such.

    4. High-Level Architecture

    The architecture separates the read path (redirect) from the write path (create) and isolates the analytics pipeline entirely. This separation is critical — the redirect path is latency-sensitive and high-throughput, while the create path is low-throughput but requires uniqueness guarantees. The analytics pipeline is fire-and-forget.

    ClientCDNLoadBalancerURL Service(Write Path)URL Service(Read Path)DatabaseRedisCacheKafkaAnalyticsStoreasync click eventswritereadinsertlookupURL Shortener — System ArchitecturePrimary pathAnalytics path

    Write Path (Create Short URL)

    1. Client sends POST /api/v1/urls with the long URL.
    2. The request passes through the CDN (not cached) and load balancer to a URL Service instance.
    3. The service generates a unique 7-character short code (see Deep Dive below).
    4. The service writes the (short_code, long_url, metadata) mapping to the database.
    5. The service returns the short URL to the client.

    Read Path (Redirect)

    1. Client visits https://short.ly/aB3kQ7.
    2. The CDN checks its edge cache. On a hit, it returns the 302 redirect immediately — sub-10ms latency.
    3. On a cache miss, the request hits the load balancer and reaches a URL Service instance.
    4. The service checks Redis cache first. On a hit, it returns the redirect.
    5. On a cache miss, it queries the database, populates the cache, and returns the redirect.
    6. Asynchronously, the service emits a click event to Kafka for analytics processing.

    The read path is optimized for the 100:1 read-to-write ratio. Most redirects are served from cache (CDN or Redis) and never touch the database.

    5. Deep Dive

    5a. Short Code Generation

    This is the heart of the problem. You need a scheme that produces short, unique, collision-free codes at scale. There are three main approaches:

    Option 1: Hash + Truncate

    Compute MD5 or SHA-256 of the long URL, then take the first 7 characters of its base-62 encoding.

    • Pro: Deterministic — the same long URL always produces the same short code (natural deduplication).
    • Con: Collisions. With a 7-character base-62 space (~3.5 trillion), birthday-paradox collisions become likely around ~1.8 million entries. You must handle collisions — retry with a salt or append a counter.

    Option 2: Counter-Based (Recommended)

    Use a globally unique auto-incrementing counter, then encode the counter value in base-62.

    • Pro: Zero collisions by construction. Simple and fast.
    • Con: Sequential codes are predictable (users can enumerate). A single counter is a bottleneck.

    Solution: Use a distributed counter service. Pre-allocate ranges of IDs to each application server. For example, Server A gets range [1M, 2M), Server B gets [2M, 3M). Each server increments locally with zero coordination until its range is exhausted, then requests a new range. This is the approach used by systems like Twitter Snowflake and Instagram.

    To avoid predictability, apply a bijective shuffle (e.g., multiply by a large coprime modulo 3.5 trillion) before base-62 encoding. The result looks random but is still collision-free. For example, if your counter produces the value 42, and your coprime constant is C, then the shuffled value is (42 * C) mod 3.5T. Because C is coprime to the modulus, this mapping is bijective — every input maps to a unique output, and vice versa. The short codes appear random to external observers while requiring zero collision handling internally.

    Option 3: Pre-Generated Key Store

    Generate millions of random 7-character codes offline and store them in a key database. When a URL is created, pop a code from the unused pool.

    • Pro: No collision logic at runtime. Codes are unpredictable.
    • Con: Requires managing the key pool — concurrency control to avoid two servers taking the same key. Typically done with a two-table approach: unused_keys and used_keys.
    Interview tip: Present all three options briefly, then commit to one and explain why. The counter-based approach with range allocation is the strongest choice — it is simple, collision-free, and horizontally scalable. Mention the bijective shuffle to address predictability.

    5b. Caching Strategy

    With 4,000+ reads/second and a 50ms p99 target, caching is not optional — it is the core of the read path.

    Two-Layer Cache

    Layer 1 — CDN (CloudFront, Cloudflare): Cache 302 redirect responses at the edge. Set Cache-Control: public, max-age=3600 for non-expiring URLs. This absorbs the vast majority of traffic for popular short URLs. A viral link might receive millions of clicks — all served from the edge.

    Layer 2 — Redis Cluster: Application-level cache in front of the database. Store short_code → long_url mappings with a TTL slightly shorter than the URL expiration. Use a cluster of Redis nodes with consistent hashing for even distribution.

    Cache Eviction and Invalidation

    • TTL-based expiration: Each cache entry expires after 1 hour (CDN) or 24 hours (Redis). This naturally handles URL expiration without explicit invalidation.
    • Write-through on create: When a new short URL is created, populate the Redis cache immediately. The first redirect will be a cache hit.
    • Deletion/expiration: When a URL is deleted or expires, purge it from Redis and issue a CDN invalidation. This is the rare case — optimize for the common path.

    Cache Hit Rate Estimation

    URL access follows a Zipf distribution — a small percentage of URLs receive the vast majority of clicks. In practice, the top 20% of URLs account for 80%+ of redirects. A Redis cluster holding 50 million entries (~5 GB at 100 bytes per entry) can achieve a 90%+ hit rate. With the CDN layer on top, fewer than 5% of requests reach the database.

    LayerHit RateLatency
    CDN edge~60-70%<10ms
    Redis cache~25-30%<5ms
    Database~5%<20ms

    6. Scaling and Trade-offs

    Bottleneck Analysis

    Database writes: At 40 writes/second average, a single PostgreSQL instance handles this easily. Even at 10x peak (400/s), this is not a bottleneck. If you choose DynamoDB, writes scale horizontally with on-demand capacity.

    Database reads: Without caching, 4,000 reads/second is manageable for PostgreSQL with read replicas. But with the two-layer cache, the database sees fewer than 200 reads/second — trivial.

    Short code generation: The counter-based approach with range allocation is horizontally scalable. Each server generates codes independently. The coordination service (ZooKeeper, etcd, or a simple database table) is only contacted when a server exhausts its range — a rare event.

    Analytics pipeline: This is the actual scaling challenge. At 4,000 clicks/second, the click event stream produces ~350 million events per day. Kafka handles this throughput comfortably. Downstream consumers aggregate events into per-minute, per-hour, and per-day rollups stored in a columnar database.

    CAP Theorem Considerations

    A URL shortener must prioritize availability over consistency. A failed redirect is a broken link — unacceptable. If a URL was just created and is not yet replicated to all nodes, a brief delay is tolerable (eventual consistency). This makes an AP system the right choice.

    In practice, you need a nuanced position — not a blanket choice of AP or CP:

    • Use eventual consistency for read replicas. A few seconds of replication lag is acceptable — a brand-new URL rarely receives clicks in its first second of existence.
    • Use strong consistency for the uniqueness check during short code creation (write path). This prevents duplicate codes. If you use the counter-based approach with range pre-allocation, this strong consistency requirement is limited to the range coordination service, not every write.
    • For the analytics pipeline, eventual consistency is more than acceptable — analytics data can be minutes behind real-time without any user impact.

    Reliability and Fault Tolerance

    • Database: Multi-AZ deployment with automated failover. Daily backups with point-in-time recovery.
    • Redis: Redis Sentinel or Cluster mode with automatic failover. If the entire cache layer fails, the system degrades gracefully — requests fall through to the database, increasing latency but not causing errors.
    • Kafka: Replication factor of 3 across brokers. If click events are lost, analytics are slightly inaccurate — an acceptable trade-off compared to losing URL mappings.
    • Rate limiting: Apply per-IP and per-user rate limits on the create endpoint to prevent abuse. A simple token bucket at the load balancer layer works.

    Monitoring and Observability

    • Key metrics: Redirect latency (p50, p95, p99), cache hit rates (CDN + Redis), error rates (4xx, 5xx), Kafka consumer lag, database connection pool utilization.
    • Alerting: Page on p99 redirect latency exceeding 100ms, error rate above 0.1%, or Kafka consumer lag exceeding 10,000 events.
    • Dashboards: Real-time redirect QPS, top-N short URLs by click volume, new URLs created per hour.

    Scoring Tips

    Interviewers evaluate system design answers across four dimensions. Here is how to maximize your score on the URL shortener problem:

    DimensionWhat Strong Looks Like
    Requirements ClarityState concrete numbers (100M URLs/month, 100:1 read-write ratio) before jumping into design. Clarify 301 vs 302 trade-off proactively.
    Architecture QualityDraw the full read and write paths. Show the two-layer cache. Separate the analytics pipeline from the redirect path — never add latency to the hot path.
    Technical DepthGo deep on short code generation. Present multiple options, pick one, and justify the choice. Discuss the bijective shuffle for unpredictability. Know the birthday paradox numbers.
    CommunicationLead the conversation. State your approach before drawing. Proactively discuss trade-offs — do not wait for the interviewer to poke holes.
    Common pitfalls to avoid: (1) Using MD5 without discussing collision handling. (2) Caching with 301 redirects and then claiming you can track analytics. (3) Writing click events synchronously in the redirect path. (4) Ignoring the counter bottleneck in a distributed deployment. (5) Not discussing what happens when a URL expires. (6) Over-engineering the write path when the system is read-heavy. (7) Forgetting to mention rate limiting on the create endpoint — without it, an attacker can exhaust your keyspace.

    Time Management in the Interview

    A typical system design round is 45 minutes, of which 35-40 are active design time. Here is how to allocate it for this problem:

    StageTimeNotes
    Requirements3-5 minClarify scope, state numbers
    API Design3-4 min3 endpoints, request/response
    Data Model3-4 minSchema + storage justification
    High-Level Architecture8-10 minDraw diagram, walk through both paths
    Deep Dive10-12 minGo deep on 1-2 topics (interviewer-led)
    Scaling5-7 minBottlenecks, CAP, monitoring

    The URL shortener question rewards breadth and depth in equal measure. Cover all six stages, go deep where the interviewer shows interest, and always articulate the trade-offs behind your decisions. Practice walking through this design end-to-end in 35 minutes — that is the typical time budget in a real interview. The strongest candidates finish each stage crisply and leave the interviewer with nothing to poke at — or better yet, proactively surface the issues the interviewer was about to raise.

    For hands-on practice with real-time AI feedback on your system design delivery, explore mock interviews on Hoppers AI — including dedicated system design sessions that evaluate your requirements gathering, architecture, and communication in real time.