Design a Video Streaming Platform — System Design Interviews

Design a Video Streaming Platform — Complete System Design Walkthrough

Video streaming is one of the most demanding system design problems you will encounter in interviews. It combines massive storage requirements, compute-intensive processing pipelines, real-time delivery under strict latency constraints, and a global CDN strategy. In this walkthrough, we will design a platform similar to YouTube or Netflix from scratch, covering each stage of a structured system design interview.

Stage 1: Requirements Gathering

Start by defining scope with your interviewer. Video streaming platforms have enormous surface area, so narrowing early is essential. Spend 3-5 minutes here.

Functional Requirements

Video upload — Creators upload videos of varying sizes (up to 10 GB). The system accepts the raw file, validates it, and prepares it for streaming.
Video transcoding — Convert uploaded videos into multiple resolutions (240p, 360p, 480p, 720p, 1080p, 4K) and codecs (H.264, H.265/HEVC, VP9, AV1) for broad device compatibility.
Adaptive bitrate streaming — Deliver video using HLS or DASH so the player can dynamically switch quality based on the viewer's network conditions.
Video search — Users can search for videos by title, description, tags, and creator name.
Recommendations — Personalized video feed based on watch history, likes, and trending content.
Video playback — Low-latency start (under 2 seconds), seek support, and smooth playback across devices (web, mobile, smart TV).

Non-Functional Requirements

Scale: 2 billion total users, 500 million DAU, 500 hours of video uploaded per minute, 1 billion video views per day.
Latency: Video playback start under 2 seconds. Upload acknowledgment (not processing) within seconds.
Availability: 99.99% for playback. Upload pipeline can tolerate slightly lower availability (99.9%) with retry semantics.
Durability: Zero data loss on uploaded videos. Once a creator uploads a video, it must never be lost.
Storage: At 500 hours/minute with an average of 1 GB per hour of raw video, that is roughly 30 TB of raw uploads per day. After transcoding to multiple resolutions, storage multiplies by 5-8x.

Interview tip: Convert your scale numbers into actionable metrics early. 1 billion views/day equals roughly 11,500 video starts per second on average, with peaks of 3-5x during prime time. These numbers directly inform your CDN capacity and origin server sizing.

Stage 2: API Design

A video streaming platform exposes REST APIs for upload and metadata management, and uses streaming protocols (HLS/DASH) for video delivery.

Upload API

Method	Endpoint	Description
POST	`/v1/videos/upload/init`	Initialize multipart upload. Returns `uploadId` and pre-signed URLs for each chunk.
PUT	`/v1/videos/upload/{uploadId}/parts/{partNumber}`	Upload a single chunk (pre-signed URL to object storage).
POST	`/v1/videos/upload/{uploadId}/complete`	Finalize upload. Triggers transcoding pipeline. Body: `{ title, description, tags[], categoryId, visibility }`

Multipart upload is essential for large files. The client uploads directly to object storage using pre-signed URLs, bypassing our application servers entirely. This keeps upload bandwidth off our compute layer.

Video Metadata API

Method	Endpoint	Description
GET	`/v1/videos/{videoId}`	Get video metadata (title, description, view count, manifest URL, thumbnails).
GET	`/v1/videos/{videoId}/stream`	Returns the HLS/DASH manifest URL (redirects to CDN). Player uses this to begin adaptive playback.
GET	`/v1/search?q={query}&cursor=&limit=20`	Search videos by text query. Returns paginated results with cursor.
GET	`/v1/feed?cursor=&limit=20`	Personalized recommendation feed for authenticated user.
POST	`/v1/videos/{videoId}/views`	Record a view event. Fire-and-forget from the client.

Design decision: Why pre-signed URLs for upload? Uploading multi-gigabyte files through our API servers would consume enormous bandwidth and create a bottleneck. Pre-signed URLs let clients upload directly to S3 (or GCS), and we only handle the lightweight metadata requests. This pattern scales independently of upload volume.

Stage 3: Data Model

A video streaming platform demands polyglot persistence — no single database handles all access patterns optimally.

Video Metadata (PostgreSQL / Vitess)

Column	Type	Notes
`video_id`	UUID (PK)	Globally unique identifier
`creator_id`	UUID (FK)	References users table
`title`	varchar(500)	Searchable
`description`	text	Searchable
`status`	enum	uploading, processing, ready, failed, removed
`visibility`	enum	public, unlisted, private
`duration_seconds`	int	Set after transcoding completes
`manifest_path`	varchar	S3 path to HLS master playlist
`thumbnail_urls`	jsonb	Auto-generated + custom thumbnails
`tags`	text[]	Used for search and recommendations
`created_at`	timestamp
`updated_at`	timestamp

User Data (PostgreSQL)

Column	Type	Notes
`user_id`	UUID (PK)
`username`	varchar (unique)
`email`	varchar (unique)
`subscriber_count`	bigint	Denormalized counter
`created_at`	timestamp

View Counts (Redis + Cassandra)

View counting is a special problem at this scale. We use a two-tier approach:

Redis — Real-time counter. Each view event increments INCR views:{video_id}. The value displayed to users reads from Redis.
Cassandra — Durable event log. Every view event is written to a Cassandra table partitioned by video_id and bucketed by date. This feeds analytics, monetization, and reconciliation jobs that periodically sync Redis counters.

Comments (Cassandra)

Column	Type	Role
`video_id`	UUID	Partition key
`comment_id`	TimeUUID	Clustering key (DESC)
`user_id`	UUID
`content`	text
`parent_comment_id`	UUID (nullable)	For threaded replies
`created_at`	timestamp

Search Index (Elasticsearch)

Video metadata (title, description, tags, creator name) is indexed in Elasticsearch. A CDC pipeline (Debezium or application-level dual writes) keeps the search index in sync with the primary PostgreSQL store. Elasticsearch handles full-text search, fuzzy matching, and relevance scoring.

Storage Choice Rationale

PostgreSQL for metadata: Relational integrity for users and video metadata. At hundreds of millions of rows (not billions), sharded PostgreSQL (Vitess) handles the load. Strong consistency for ownership and visibility controls.
Cassandra for views and comments: Append-heavy, partition-friendly access patterns. Views are partitioned by video_id with date bucketing. Comments are partitioned by video_id for co-located reads.
Redis for real-time counters: Sub-millisecond reads for view counts displayed on every page load. Eventual consistency with Cassandra is acceptable — a count being off by a few hundred on a video with millions of views is invisible to users.
Elasticsearch for search: Full-text search with relevance ranking, autocomplete, and typo tolerance are all native capabilities.
Object storage (S3/GCS) for video files: The actual video segments, manifests, and thumbnails. Virtually unlimited capacity with 11 nines of durability.

Stage 4: High-Level Architecture

The architecture splits cleanly into two major flows: upload and processing (write path) and playback (read path).

Upload Pipeline

Client initiates upload by calling /v1/videos/upload/init. The Upload Service creates a video record with status uploading in PostgreSQL and returns pre-signed URLs for multipart upload directly to S3.
Client uploads chunks directly to object storage using the pre-signed URLs. Each chunk is typically 5-10 MB. The client can upload multiple chunks in parallel.
Client finalizes by calling /v1/videos/upload/complete. The Upload Service verifies all parts are present, assembles the object in S3, updates the video status to processing, and publishes a message to the Transcoding Queue (SQS or Kafka).
Transcoding Pipeline (described in detail in Stage 5) consumes the message, transcodes the video into multiple resolutions, generates HLS manifests and thumbnails, and writes all output to S3.
On completion, the pipeline updates the video record to status ready with the manifest path. The video is now playable.

Playback Flow

Client requests video metadata via /v1/videos/{videoId}. The API returns the manifest URL pointing to the CDN.
Client fetches the HLS master manifest from the CDN edge. This manifest lists available quality levels (resolutions and bitrates).
The video player selects an initial quality based on estimated bandwidth and requests the corresponding media playlist (a list of 2-10 second segment URLs).
The player downloads segments sequentially from the CDN. If bandwidth changes, the player switches to a different quality level seamlessly — this is adaptive bitrate streaming.
If a segment is not cached at the CDN edge, the CDN fetches it from the origin (S3) and caches it for subsequent requests.

Transcoding DAG

Transcoding is not a single operation but a directed acyclic graph of tasks:

Probe — Inspect the input file (codec, resolution, duration, audio channels).
Split — Divide the video into segments (typically 2-10 seconds each) for parallel processing.
Transcode (video) — For each segment, encode into each target resolution and codec. This is the most compute-intensive step and runs in parallel across segments and resolutions.
Transcode (audio) — Extract and encode audio tracks (AAC, Opus) at multiple bitrates.
Generate thumbnails — Extract representative frames at regular intervals for the timeline scrubber and poster images.
Package — Assemble HLS/DASH manifests that reference the transcoded segments. Write the master manifest linking all quality levels.
Validate — Run quality checks (duration matches, no corrupted segments, manifest parseable).
Publish — Update video status to ready, push manifests to CDN warm-up.

Stage 5: Deep Dive

We will dive deep into two critical subsystems: the transcoding pipeline and adaptive bitrate streaming.

Deep Dive 1: Transcoding Pipeline

At 500 hours of video uploaded per minute, the transcoding system must handle massive throughput while remaining cost-efficient and fault-tolerant.

Architecture

The pipeline is orchestrated by a workflow engine (such as AWS Step Functions, Temporal, or Apache Airflow). Each uploaded video triggers a workflow instance that manages the DAG of transcoding tasks.

Workers are stateless containers running FFmpeg. They pull tasks from a queue, process a single segment at a specific resolution, and write the output to S3.
Parallelism is the key to performance. A 10-minute video split into 2-second segments produces 300 segments. Each segment is transcoded into 6 resolutions independently, yielding 1,800 tasks that can run in parallel.
Spot/preemptible instances reduce cost by 60-80%. Transcoding tasks are idempotent — if a spot instance is reclaimed, the task is simply retried on another worker.

Encoding Ladder

The encoding ladder defines which resolution-bitrate combinations to produce:

Resolution	Bitrate (H.264)	Bitrate (H.265)	Use Case
240p	300 kbps	150 kbps	Very slow mobile connections
360p	600 kbps	300 kbps	Mobile on 3G
480p	1.2 Mbps	600 kbps	Standard mobile
720p	3 Mbps	1.5 Mbps	Desktop / tablet
1080p	6 Mbps	3 Mbps	HD desktop / smart TV
4K	16 Mbps	8 Mbps	4K displays

A more advanced system uses per-title encoding: analyzing the content complexity of each video to determine optimal bitrates. An animation with flat colors compresses far better than a fast-action sports clip at the same resolution. Netflix pioneered this approach, saving 20% bandwidth without perceptible quality loss.

Fault Tolerance

Idempotent tasks: Each task writes to a deterministic S3 path (videos/{video_id}/{resolution}/{segment_number}.ts). Re-running produces identical output.
Dead letter queue: Tasks that fail after 3 retries are sent to a DLQ for manual investigation. The video status transitions to failed with a diagnostic error code.
Partial availability: If only some resolutions succeed, the system can publish the video with available resolutions and re-queue the failed ones. A video with 480p and 720p is better than no video at all.

Deep Dive 2: Adaptive Bitrate Streaming (HLS/DASH)

Adaptive bitrate (ABR) streaming is the mechanism that lets a video player switch quality levels mid-stream based on real-time network conditions.

How HLS Works

HTTP Live Streaming (HLS) uses a two-level manifest structure:

Master Manifest (master.m3u8) — Lists all available quality variants with their resolution, bitrate, and codec. The player downloads this first.

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=300000,RESOLUTION=426x240
240p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1200000,RESOLUTION=854x480
480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
1080p/playlist.m3u8

Media Playlist (480p/playlist.m3u8) — Lists the actual video segment URLs with their durations. The player downloads segments sequentially.

#EXTM3U
#EXT-X-TARGETDURATION:4
#EXTINF:4.0,
segment_001.ts
#EXTINF:4.0,
segment_002.ts
#EXTINF:3.8,
segment_003.ts

ABR Algorithm

The player's ABR algorithm is the brain of the streaming experience. It must balance three competing goals:

Maximize quality — Play at the highest resolution the network can sustain.
Minimize rebuffering — Never let the playback buffer run empty (causes stalling).
Minimize startup time — Start playback fast, even if it means beginning at a lower resolution.

A simplified ABR strategy:

Start at the lowest quality for fast first-frame rendering (under 2 seconds).
Measure download throughput for each segment. If the last 3 segments downloaded at 5 Mbps, the player can safely switch to 1080p (which requires 6 Mbps — apply a 0.8 safety margin, so effective threshold is 4.8 Mbps).
Monitor the buffer level. If the buffer drops below 5 seconds, immediately drop to a lower quality regardless of throughput estimates.
Use a ramp-up delay — do not jump from 240p to 4K in one step. Increase one quality level per segment to avoid overshooting.

HLS vs. DASH

Feature	HLS	DASH
Developed by	Apple	MPEG (industry standard)
Container	.ts (MPEG-TS) or .fmp4	.mp4 (fragmented)
Manifest format	.m3u8 (text-based)	.mpd (XML-based)
Browser support	Native on Safari; MSE-based on others	MSE-based on all browsers
DRM support	FairPlay	Widevine, PlayReady
Codec flexibility	Good (CMAF adds parity)	Excellent

In practice, most platforms produce CMAF (Common Media Application Format) — fragmented MP4 segments that are compatible with both HLS and DASH manifests. This means you encode once and generate two manifest formats, avoiding duplicate storage.

Segment Size Trade-Off

Shorter segments (2 seconds): Faster quality switching, lower latency for live streams, but more HTTP requests and higher CDN overhead (more objects to cache).
Longer segments (10 seconds): Better compression efficiency and fewer requests, but slower adaptation to bandwidth changes and higher startup latency.
The sweet spot for VOD is typically 4-6 seconds. This balances compression efficiency with responsive adaptation.

Stage 6: Scaling and Trade-Offs

CDN Strategy

At 1 billion views per day, the CDN is the most critical infrastructure component. Without it, origin servers would be crushed under the load.

Multi-CDN: Use multiple CDN providers (CloudFront, Akamai, Fastly) and route viewers to the best-performing CDN based on real-time latency data. This also provides failover if one CDN experiences an outage.
Tiered caching: CDN edges (hundreds of PoPs globally) cache popular segments. Regional mid-tier caches sit between edges and the origin. A cache miss at the edge checks the mid-tier before hitting origin. This reduces origin load by 95%+ for popular content.
Popularity-based pre-warming: For trending or newly released videos, proactively push segments to edge caches in regions with high expected viewership. A video from a creator with 50 million subscribers should be pre-cached before the notification goes out.
Long-tail optimization: 80% of views go to 20% of videos. The long tail of rarely watched content will always be a cache miss. For these, serve from a single origin region and accept higher latency rather than polluting cache with rarely accessed segments.

View Counting at Scale

1 billion views per day is approximately 11,500 view events per second. Naively incrementing a database counter per view would create a massive write hotspot for popular videos.

Client-side batching: The player sends a view event after 30 seconds of watch time (to filter out accidental clicks). This reduces total events by approximately 40%.
Write buffering in Redis: View events are batched in Redis (INCRBY views:{video_id} 1). A background job flushes accumulated counts to Cassandra every 60 seconds.
Approximate counts: For real-time display, the Redis counter is accurate enough. For analytics and monetization, the Cassandra event log provides exact counts after reconciliation.
Sharded counters: For viral videos with millions of concurrent viewers, a single Redis key becomes a hotspot. Use N sharded keys (views:{video_id}:{shard}) and sum them on read. This trades read complexity for write throughput.

Storage Tiers: Hot, Warm, Cold

With 30 TB of raw uploads per day multiplied by 5-8x for transcoded variants, storage costs grow rapidly. A tiered approach keeps costs manageable:

Tier	Storage Class	Content	Access Pattern
Hot	S3 Standard	Videos viewed in the last 7 days, trending content	Frequent reads via CDN origin pulls
Warm	S3 Infrequent Access	Videos with 1-100 views/month	Occasional reads, slightly higher retrieval latency
Cold	S3 Glacier Instant Retrieval	Videos with less than 1 view/month	Rare reads, millisecond retrieval on demand

S3 Lifecycle policies automatically transition objects between tiers based on access patterns. The master copy (highest quality transcode) is always retained. Lower resolutions of cold content can be deleted and re-transcoded on demand if a viewer requests them — this saves significant storage at the cost of occasional transcoding latency.

Copyright Detection (Content ID)

A video streaming platform at scale must detect copyrighted content to avoid legal liability and protect creators.

Audio fingerprinting: Extract a perceptual hash of the audio track and compare against a database of copyrighted works. Algorithms like Chromaprint produce compact fingerprints that are robust to compression, pitch shifts, and background noise.
Video fingerprinting: Extract visual fingerprints from keyframes. Compare against a reference database using similarity search (approximate nearest neighbors via FAISS or ScaNN).
Pipeline integration: Content ID runs as a step in the transcoding DAG. After transcoding completes, the fingerprinting step runs before the video is published. If a match is found, the video is flagged for review or automatically handled per the copyright holder's policy (block, monetize, or track).
Scale: With 500 hours uploaded per minute, the fingerprinting system must process content faster than real-time. Batch processing on GPU clusters (for video fingerprinting) and CPU workers (for audio) handles the throughput.

Scoring Tips

To score well on a video streaming design question, keep these principles in mind:

Separate upload from playback. These are fundamentally different systems with different latency requirements, scale characteristics, and failure modes. Interviewers expect you to treat them as independent flows.
Explain the transcoding pipeline in detail. This is where most of the technical complexity lives. Show you understand the DAG structure, parallelization strategy, encoding ladders, and fault tolerance. Mentioning per-title encoding or CMAF demonstrates depth.
Know adaptive bitrate streaming cold. Be able to explain the two-level manifest structure, how the ABR algorithm selects quality, and the segment size trade-off. This is the core technology that makes streaming work.
Quantify your CDN strategy. Do not just say "use a CDN." Explain tiered caching, cache hit ratios, pre-warming for popular content, and the long-tail problem. Show you understand that the CDN is not a magic box — it has capacity limits and cache eviction policies.
Address the hard scaling problems proactively. View counting at scale, storage tiering, and content moderation are areas where interviewers probe for production-level thinking. A candidate who brings up sharded counters and hot/warm/cold storage unprompted stands out.
Show cost awareness. Storage and CDN bandwidth are the two largest cost drivers. Mentioning spot instances for transcoding, S3 lifecycle policies, and multi-CDN cost optimization signals that you think about systems holistically — not just correctness and performance.

Practice delivering this architecture end-to-end in under 35 minutes. Focus on smooth transitions between stages — requirements should naturally motivate your API design, which should inform your data model, which feeds into your architecture. If you can walk through each stage while fielding follow-up questions confidently, you are well-prepared. Tools like Hoppers AI can help you rehearse this flow with real-time feedback on structure, pacing, and technical depth.