We use cookies for analytics to improve our service. See our Privacy Policy.

    Sign up free to unlock interview prep materials and a free mock interview for your next role.

    Start Free
    system design
    question breakdown
    video streaming

    Design a Video Streaming Platform

    Hoppers AI Team·March 12, 2026·12 min read

    Design a Video Streaming Platform — Complete System Design Walkthrough

    Video streaming is one of the most demanding system design problems you will encounter in interviews. It combines massive storage requirements, compute-intensive processing pipelines, real-time delivery under strict latency constraints, and a global CDN strategy. In this walkthrough, we will design a platform similar to YouTube or Netflix from scratch, covering each stage of a structured system design interview.

    UPLOAD PIPELINEUserUploadServiceTranscodingPipelineObjectStorageCDNrawqueueHLSpushPLAYBACK FLOWUserCDN EdgeAdaptive BitrateSelectionOrigin Storerequestmanifestmissstream segmentsSynchronousAsynchronousVideo Streaming Platform — Upload & Playback Architecture

    Stage 1: Requirements Gathering

    Start by defining scope with your interviewer. Video streaming platforms have enormous surface area, so narrowing early is essential. Spend 3-5 minutes here.

    Functional Requirements

    • Video upload — Creators upload videos of varying sizes (up to 10 GB). The system accepts the raw file, validates it, and prepares it for streaming.
    • Video transcoding — Convert uploaded videos into multiple resolutions (240p, 360p, 480p, 720p, 1080p, 4K) and codecs (H.264, H.265/HEVC, VP9, AV1) for broad device compatibility.
    • Adaptive bitrate streaming — Deliver video using HLS or DASH so the player can dynamically switch quality based on the viewer's network conditions.
    • Video search — Users can search for videos by title, description, tags, and creator name.
    • Recommendations — Personalized video feed based on watch history, likes, and trending content.
    • Video playback — Low-latency start (under 2 seconds), seek support, and smooth playback across devices (web, mobile, smart TV).

    Non-Functional Requirements

    • Scale: 2 billion total users, 500 million DAU, 500 hours of video uploaded per minute, 1 billion video views per day.
    • Latency: Video playback start under 2 seconds. Upload acknowledgment (not processing) within seconds.
    • Availability: 99.99% for playback. Upload pipeline can tolerate slightly lower availability (99.9%) with retry semantics.
    • Durability: Zero data loss on uploaded videos. Once a creator uploads a video, it must never be lost.
    • Storage: At 500 hours/minute with an average of 1 GB per hour of raw video, that is roughly 30 TB of raw uploads per day. After transcoding to multiple resolutions, storage multiplies by 5-8x.
    Interview tip: Convert your scale numbers into actionable metrics early. 1 billion views/day equals roughly 11,500 video starts per second on average, with peaks of 3-5x during prime time. These numbers directly inform your CDN capacity and origin server sizing.

    Stage 2: API Design

    A video streaming platform exposes REST APIs for upload and metadata management, and uses streaming protocols (HLS/DASH) for video delivery.

    Upload API

    MethodEndpointDescription
    POST/v1/videos/upload/initInitialize multipart upload. Returns uploadId and pre-signed URLs for each chunk.
    PUT/v1/videos/upload/{uploadId}/parts/{partNumber}Upload a single chunk (pre-signed URL to object storage).
    POST/v1/videos/upload/{uploadId}/completeFinalize upload. Triggers transcoding pipeline. Body: { title, description, tags[], categoryId, visibility }

    Multipart upload is essential for large files. The client uploads directly to object storage using pre-signed URLs, bypassing our application servers entirely. This keeps upload bandwidth off our compute layer.

    Video Metadata API

    MethodEndpointDescription
    GET/v1/videos/{videoId}Get video metadata (title, description, view count, manifest URL, thumbnails).
    GET/v1/videos/{videoId}/streamReturns the HLS/DASH manifest URL (redirects to CDN). Player uses this to begin adaptive playback.
    GET/v1/search?q={query}&cursor=&limit=20Search videos by text query. Returns paginated results with cursor.
    GET/v1/feed?cursor=&limit=20Personalized recommendation feed for authenticated user.
    POST/v1/videos/{videoId}/viewsRecord a view event. Fire-and-forget from the client.
    Design decision: Why pre-signed URLs for upload? Uploading multi-gigabyte files through our API servers would consume enormous bandwidth and create a bottleneck. Pre-signed URLs let clients upload directly to S3 (or GCS), and we only handle the lightweight metadata requests. This pattern scales independently of upload volume.

    Stage 3: Data Model

    A video streaming platform demands polyglot persistence — no single database handles all access patterns optimally.

    Video Metadata (PostgreSQL / Vitess)

    ColumnTypeNotes
    video_idUUID (PK)Globally unique identifier
    creator_idUUID (FK)References users table
    titlevarchar(500)Searchable
    descriptiontextSearchable
    statusenumuploading, processing, ready, failed, removed
    visibilityenumpublic, unlisted, private
    duration_secondsintSet after transcoding completes
    manifest_pathvarcharS3 path to HLS master playlist
    thumbnail_urlsjsonbAuto-generated + custom thumbnails
    tagstext[]Used for search and recommendations
    created_attimestamp
    updated_attimestamp

    User Data (PostgreSQL)

    ColumnTypeNotes
    user_idUUID (PK)
    usernamevarchar (unique)
    emailvarchar (unique)
    subscriber_countbigintDenormalized counter
    created_attimestamp

    View Counts (Redis + Cassandra)

    View counting is a special problem at this scale. We use a two-tier approach:

    • Redis — Real-time counter. Each view event increments INCR views:{video_id}. The value displayed to users reads from Redis.
    • Cassandra — Durable event log. Every view event is written to a Cassandra table partitioned by video_id and bucketed by date. This feeds analytics, monetization, and reconciliation jobs that periodically sync Redis counters.

    Comments (Cassandra)

    ColumnTypeRole
    video_idUUIDPartition key
    comment_idTimeUUIDClustering key (DESC)
    user_idUUID
    contenttext
    parent_comment_idUUID (nullable)For threaded replies
    created_attimestamp

    Search Index (Elasticsearch)

    Video metadata (title, description, tags, creator name) is indexed in Elasticsearch. A CDC pipeline (Debezium or application-level dual writes) keeps the search index in sync with the primary PostgreSQL store. Elasticsearch handles full-text search, fuzzy matching, and relevance scoring.

    Storage Choice Rationale

    • PostgreSQL for metadata: Relational integrity for users and video metadata. At hundreds of millions of rows (not billions), sharded PostgreSQL (Vitess) handles the load. Strong consistency for ownership and visibility controls.
    • Cassandra for views and comments: Append-heavy, partition-friendly access patterns. Views are partitioned by video_id with date bucketing. Comments are partitioned by video_id for co-located reads.
    • Redis for real-time counters: Sub-millisecond reads for view counts displayed on every page load. Eventual consistency with Cassandra is acceptable — a count being off by a few hundred on a video with millions of views is invisible to users.
    • Elasticsearch for search: Full-text search with relevance ranking, autocomplete, and typo tolerance are all native capabilities.
    • Object storage (S3/GCS) for video files: The actual video segments, manifests, and thumbnails. Virtually unlimited capacity with 11 nines of durability.

    Stage 4: High-Level Architecture

    The architecture splits cleanly into two major flows: upload and processing (write path) and playback (read path).

    Upload Pipeline

    1. Client initiates upload by calling /v1/videos/upload/init. The Upload Service creates a video record with status uploading in PostgreSQL and returns pre-signed URLs for multipart upload directly to S3.
    2. Client uploads chunks directly to object storage using the pre-signed URLs. Each chunk is typically 5-10 MB. The client can upload multiple chunks in parallel.
    3. Client finalizes by calling /v1/videos/upload/complete. The Upload Service verifies all parts are present, assembles the object in S3, updates the video status to processing, and publishes a message to the Transcoding Queue (SQS or Kafka).
    4. Transcoding Pipeline (described in detail in Stage 5) consumes the message, transcodes the video into multiple resolutions, generates HLS manifests and thumbnails, and writes all output to S3.
    5. On completion, the pipeline updates the video record to status ready with the manifest path. The video is now playable.

    Playback Flow

    1. Client requests video metadata via /v1/videos/{videoId}. The API returns the manifest URL pointing to the CDN.
    2. Client fetches the HLS master manifest from the CDN edge. This manifest lists available quality levels (resolutions and bitrates).
    3. The video player selects an initial quality based on estimated bandwidth and requests the corresponding media playlist (a list of 2-10 second segment URLs).
    4. The player downloads segments sequentially from the CDN. If bandwidth changes, the player switches to a different quality level seamlessly — this is adaptive bitrate streaming.
    5. If a segment is not cached at the CDN edge, the CDN fetches it from the origin (S3) and caches it for subsequent requests.

    Transcoding DAG

    Transcoding is not a single operation but a directed acyclic graph of tasks:

    • Probe — Inspect the input file (codec, resolution, duration, audio channels).
    • Split — Divide the video into segments (typically 2-10 seconds each) for parallel processing.
    • Transcode (video) — For each segment, encode into each target resolution and codec. This is the most compute-intensive step and runs in parallel across segments and resolutions.
    • Transcode (audio) — Extract and encode audio tracks (AAC, Opus) at multiple bitrates.
    • Generate thumbnails — Extract representative frames at regular intervals for the timeline scrubber and poster images.
    • Package — Assemble HLS/DASH manifests that reference the transcoded segments. Write the master manifest linking all quality levels.
    • Validate — Run quality checks (duration matches, no corrupted segments, manifest parseable).
    • Publish — Update video status to ready, push manifests to CDN warm-up.

    Stage 5: Deep Dive

    We will dive deep into two critical subsystems: the transcoding pipeline and adaptive bitrate streaming.

    Deep Dive 1: Transcoding Pipeline

    At 500 hours of video uploaded per minute, the transcoding system must handle massive throughput while remaining cost-efficient and fault-tolerant.

    Architecture

    The pipeline is orchestrated by a workflow engine (such as AWS Step Functions, Temporal, or Apache Airflow). Each uploaded video triggers a workflow instance that manages the DAG of transcoding tasks.

    • Workers are stateless containers running FFmpeg. They pull tasks from a queue, process a single segment at a specific resolution, and write the output to S3.
    • Parallelism is the key to performance. A 10-minute video split into 2-second segments produces 300 segments. Each segment is transcoded into 6 resolutions independently, yielding 1,800 tasks that can run in parallel.
    • Spot/preemptible instances reduce cost by 60-80%. Transcoding tasks are idempotent — if a spot instance is reclaimed, the task is simply retried on another worker.

    Encoding Ladder

    The encoding ladder defines which resolution-bitrate combinations to produce:

    ResolutionBitrate (H.264)Bitrate (H.265)Use Case
    240p300 kbps150 kbpsVery slow mobile connections
    360p600 kbps300 kbpsMobile on 3G
    480p1.2 Mbps600 kbpsStandard mobile
    720p3 Mbps1.5 MbpsDesktop / tablet
    1080p6 Mbps3 MbpsHD desktop / smart TV
    4K16 Mbps8 Mbps4K displays

    A more advanced system uses per-title encoding: analyzing the content complexity of each video to determine optimal bitrates. An animation with flat colors compresses far better than a fast-action sports clip at the same resolution. Netflix pioneered this approach, saving 20% bandwidth without perceptible quality loss.

    Fault Tolerance

    • Idempotent tasks: Each task writes to a deterministic S3 path (videos/{video_id}/{resolution}/{segment_number}.ts). Re-running produces identical output.
    • Dead letter queue: Tasks that fail after 3 retries are sent to a DLQ for manual investigation. The video status transitions to failed with a diagnostic error code.
    • Partial availability: If only some resolutions succeed, the system can publish the video with available resolutions and re-queue the failed ones. A video with 480p and 720p is better than no video at all.

    Deep Dive 2: Adaptive Bitrate Streaming (HLS/DASH)

    Adaptive bitrate (ABR) streaming is the mechanism that lets a video player switch quality levels mid-stream based on real-time network conditions.

    How HLS Works

    HTTP Live Streaming (HLS) uses a two-level manifest structure:

    1. Master Manifest (master.m3u8) — Lists all available quality variants with their resolution, bitrate, and codec. The player downloads this first.
    #EXTM3U
    #EXT-X-STREAM-INF:BANDWIDTH=300000,RESOLUTION=426x240
    240p/playlist.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=1200000,RESOLUTION=854x480
    480p/playlist.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
    1080p/playlist.m3u8
    1. Media Playlist (480p/playlist.m3u8) — Lists the actual video segment URLs with their durations. The player downloads segments sequentially.
    #EXTM3U
    #EXT-X-TARGETDURATION:4
    #EXTINF:4.0,
    segment_001.ts
    #EXTINF:4.0,
    segment_002.ts
    #EXTINF:3.8,
    segment_003.ts

    ABR Algorithm

    The player's ABR algorithm is the brain of the streaming experience. It must balance three competing goals:

    • Maximize quality — Play at the highest resolution the network can sustain.
    • Minimize rebuffering — Never let the playback buffer run empty (causes stalling).
    • Minimize startup time — Start playback fast, even if it means beginning at a lower resolution.

    A simplified ABR strategy:

    1. Start at the lowest quality for fast first-frame rendering (under 2 seconds).
    2. Measure download throughput for each segment. If the last 3 segments downloaded at 5 Mbps, the player can safely switch to 1080p (which requires 6 Mbps — apply a 0.8 safety margin, so effective threshold is 4.8 Mbps).
    3. Monitor the buffer level. If the buffer drops below 5 seconds, immediately drop to a lower quality regardless of throughput estimates.
    4. Use a ramp-up delay — do not jump from 240p to 4K in one step. Increase one quality level per segment to avoid overshooting.

    HLS vs. DASH

    FeatureHLSDASH
    Developed byAppleMPEG (industry standard)
    Container.ts (MPEG-TS) or .fmp4.mp4 (fragmented)
    Manifest format.m3u8 (text-based).mpd (XML-based)
    Browser supportNative on Safari; MSE-based on othersMSE-based on all browsers
    DRM supportFairPlayWidevine, PlayReady
    Codec flexibilityGood (CMAF adds parity)Excellent

    In practice, most platforms produce CMAF (Common Media Application Format) — fragmented MP4 segments that are compatible with both HLS and DASH manifests. This means you encode once and generate two manifest formats, avoiding duplicate storage.

    Segment Size Trade-Off

    • Shorter segments (2 seconds): Faster quality switching, lower latency for live streams, but more HTTP requests and higher CDN overhead (more objects to cache).
    • Longer segments (10 seconds): Better compression efficiency and fewer requests, but slower adaptation to bandwidth changes and higher startup latency.
    • The sweet spot for VOD is typically 4-6 seconds. This balances compression efficiency with responsive adaptation.

    Stage 6: Scaling and Trade-Offs

    CDN Strategy

    At 1 billion views per day, the CDN is the most critical infrastructure component. Without it, origin servers would be crushed under the load.

    • Multi-CDN: Use multiple CDN providers (CloudFront, Akamai, Fastly) and route viewers to the best-performing CDN based on real-time latency data. This also provides failover if one CDN experiences an outage.
    • Tiered caching: CDN edges (hundreds of PoPs globally) cache popular segments. Regional mid-tier caches sit between edges and the origin. A cache miss at the edge checks the mid-tier before hitting origin. This reduces origin load by 95%+ for popular content.
    • Popularity-based pre-warming: For trending or newly released videos, proactively push segments to edge caches in regions with high expected viewership. A video from a creator with 50 million subscribers should be pre-cached before the notification goes out.
    • Long-tail optimization: 80% of views go to 20% of videos. The long tail of rarely watched content will always be a cache miss. For these, serve from a single origin region and accept higher latency rather than polluting cache with rarely accessed segments.

    View Counting at Scale

    1 billion views per day is approximately 11,500 view events per second. Naively incrementing a database counter per view would create a massive write hotspot for popular videos.

    • Client-side batching: The player sends a view event after 30 seconds of watch time (to filter out accidental clicks). This reduces total events by approximately 40%.
    • Write buffering in Redis: View events are batched in Redis (INCRBY views:{video_id} 1). A background job flushes accumulated counts to Cassandra every 60 seconds.
    • Approximate counts: For real-time display, the Redis counter is accurate enough. For analytics and monetization, the Cassandra event log provides exact counts after reconciliation.
    • Sharded counters: For viral videos with millions of concurrent viewers, a single Redis key becomes a hotspot. Use N sharded keys (views:{video_id}:{shard}) and sum them on read. This trades read complexity for write throughput.

    Storage Tiers: Hot, Warm, Cold

    With 30 TB of raw uploads per day multiplied by 5-8x for transcoded variants, storage costs grow rapidly. A tiered approach keeps costs manageable:

    TierStorage ClassContentAccess Pattern
    HotS3 StandardVideos viewed in the last 7 days, trending contentFrequent reads via CDN origin pulls
    WarmS3 Infrequent AccessVideos with 1-100 views/monthOccasional reads, slightly higher retrieval latency
    ColdS3 Glacier Instant RetrievalVideos with less than 1 view/monthRare reads, millisecond retrieval on demand

    S3 Lifecycle policies automatically transition objects between tiers based on access patterns. The master copy (highest quality transcode) is always retained. Lower resolutions of cold content can be deleted and re-transcoded on demand if a viewer requests them — this saves significant storage at the cost of occasional transcoding latency.

    Copyright Detection (Content ID)

    A video streaming platform at scale must detect copyrighted content to avoid legal liability and protect creators.

    • Audio fingerprinting: Extract a perceptual hash of the audio track and compare against a database of copyrighted works. Algorithms like Chromaprint produce compact fingerprints that are robust to compression, pitch shifts, and background noise.
    • Video fingerprinting: Extract visual fingerprints from keyframes. Compare against a reference database using similarity search (approximate nearest neighbors via FAISS or ScaNN).
    • Pipeline integration: Content ID runs as a step in the transcoding DAG. After transcoding completes, the fingerprinting step runs before the video is published. If a match is found, the video is flagged for review or automatically handled per the copyright holder's policy (block, monetize, or track).
    • Scale: With 500 hours uploaded per minute, the fingerprinting system must process content faster than real-time. Batch processing on GPU clusters (for video fingerprinting) and CPU workers (for audio) handles the throughput.

    Scoring Tips

    To score well on a video streaming design question, keep these principles in mind:

    • Separate upload from playback. These are fundamentally different systems with different latency requirements, scale characteristics, and failure modes. Interviewers expect you to treat them as independent flows.
    • Explain the transcoding pipeline in detail. This is where most of the technical complexity lives. Show you understand the DAG structure, parallelization strategy, encoding ladders, and fault tolerance. Mentioning per-title encoding or CMAF demonstrates depth.
    • Know adaptive bitrate streaming cold. Be able to explain the two-level manifest structure, how the ABR algorithm selects quality, and the segment size trade-off. This is the core technology that makes streaming work.
    • Quantify your CDN strategy. Do not just say "use a CDN." Explain tiered caching, cache hit ratios, pre-warming for popular content, and the long-tail problem. Show you understand that the CDN is not a magic box — it has capacity limits and cache eviction policies.
    • Address the hard scaling problems proactively. View counting at scale, storage tiering, and content moderation are areas where interviewers probe for production-level thinking. A candidate who brings up sharded counters and hot/warm/cold storage unprompted stands out.
    • Show cost awareness. Storage and CDN bandwidth are the two largest cost drivers. Mentioning spot instances for transcoding, S3 lifecycle policies, and multi-CDN cost optimization signals that you think about systems holistically — not just correctness and performance.

    Practice delivering this architecture end-to-end in under 35 minutes. Focus on smooth transitions between stages — requirements should naturally motivate your API design, which should inform your data model, which feeds into your architecture. If you can walk through each stage while fielding follow-up questions confidently, you are well-prepared. Tools like Hoppers AI can help you rehearse this flow with real-time feedback on structure, pacing, and technical depth.