Should I use H.264, H.265, or AV1 for streaming video?

Keep H.264 as your fallback — virtually every device supports it. H.265 saves about 37% bitrate at equal quality and has hardware decode on iPhone 7+ and most 4K TVs, but its patent licensing is messy. AV1 saves about 53% versus H.264 and is royalty-free, though hardware decode only arrived with recent devices like iPhone 15 Pro. A common strategy: H.264 fallback plus H.265 for iOS and AV1 for modern devices.

How much does video compression actually reduce file size?

Uncompressed 1080p 30 fps video is about 89 MB per second — a 90-minute movie would be roughly 480 GB, and 4K around 1.9 TB. Modern codecs compress this to under 1% of raw size using intra-frame compression (DCT, like JPEG) plus inter-frame compression (motion vectors storing only differences between frames). A real Netflix 4K movie ends up at just 5–15 GB, a 100–500x reduction.

What is a good VMAF score for video quality?

VMAF, Netflix's open-source perceptual quality metric, scores video from 0 to 100 based on machine-learning fusion of multiple features. A score of 93 or above is considered visually lossless, around 80 is high quality, around 60 is acceptable, and 40 or below shows obvious compression artifacts. It correlates with human perception far better than older pixel-based metrics like PSNR or SSIM.

阅读中文版 →

VOD Deep Dive Part 2: Video Codecs — Why a 4K Movie Fits in 5 GB

How video compression works, why H.264 still dominates, when to choose H.265 or AV1, per-title encoding, VMAF quality metrics, and hands-on ffmpeg examples.

zhuermu · May 10, 2026 · 25 min

vodstreamingcodech264h265av1ffmpeg

This is Part 2 of the VOD Streaming Deep Dive series.

Let’s Do the Math: How Big Is Uncompressed Video?

From Part 1, we know:

A 1080p, 30 fps, YUV 4:2:0, 8-bit uncompressed video stream:

Per-second size = 1920 × 1080 × 1.5 (YUV 4:2:0) × 30 ÷ 1024² ≈ 89 MB/sec

So:

1 minute ≈ 5.3 GB
90-minute movie ≈ 480 GB
4K movie (4× the pixels) ≈ 1.9 TB

A real Netflix 4K movie is 5–15 GB. That means:

Encoding compresses video to less than 1% of its raw size.

Not magic — decades of mathematics. Here’s how.

Encoding and Decoding: Two Sides of a Coin

Raw video (89 MB/s)          Compressed video (2 MB/s)        Display
    ┃                             ┃                            ┃
    ▼                             ▼                            ▼
 ┌──────┐    encode           ┌──────┐    decode            ┌──────┐
 │ Raw  │  ──────────────►   │ File │  ────────────────►   │ Play │
 │ file │    (x264/x265)     │      │   (player/hardware)  │      │
 └──────┘                    └──────┘                      └──────┘

Encoding: Compress a large file into a small one → slow, CPU/GPU-intensive
Decoding: Reconstruct the image from the compressed file → fast, phones have dedicated hardware

Encoder + Decoder = Codec (coder-decoder).

Why is encoding so much slower? Encoding explores all possibilities to find the optimal compression. Decoding just follows the instructions. Like packing an oddly-shaped item into a box versus opening the box.

The Two Axes of Video Compression

Axis 1: Intra-frame Compression

Compress each image on its own — similar to JPEG:

The human eye is sensitive to low-frequency information (large color blocks) but not high-frequency detail (noise, fine edges)
DCT (Discrete Cosine Transform) converts pixels from spatial to frequency domain
Unimportant high-frequency coefficients are quantized away

This produces I-frames — each independently decodable.

Axis 2: Inter-frame Compression

Exploit the fact that adjacent frames are nearly identical — only store differences.

Frame N (I-frame): Complete image
┌─────────────┐
│   🚗         │
│  ___________ │  ← Stored in full
└─────────────┘

Frame N+1 (P-frame): Only the difference
"Move the car in frame N 15 pixels to the right"
─────────► A few bytes to describe

The technique: Motion Estimation + Motion Compensation:

Divide the frame into small blocks (macroblocks, typically 16×16 or 8×8)
For each block, search the previous frame for the best match
Record only the motion vector (how far it moved) + residual (tiny remaining difference)

Inter-frame compression is vastly more efficient than intra-frame — this is why video is orders of magnitude smaller than a JPEG image sequence.

Additional Codec Techniques (Know They Exist, Don’t Memorize)

Technique	What it does	Effect
Transform	DCT / Integer Transform: pixels → frequency coefficients	Prepares data for quantization
Quantization	Divide coefficients by an integer, round	Main quality/compression knob
Entropy Coding	CABAC / CAVLC: use fewer bits for common symbols	Lossless final squeeze
In-loop Filter	Remove blocking artifacts	Smoother image
SAO / ALF (H.265+)	Adaptive sample offset	Reduces edge artifacts
Multi-reference	P/B frames can reference multiple past frames	Better prediction → smaller residuals

In practice, you control these through encoder parameters — no need to implement them yourself.

The Five Major Codecs

H.264 / AVC (2003): The Universal Gold Standard

Compatibility: virtually every device that plays video supports it
Compression: our baseline for comparison
Patents: paid (MPEG LA pool), but industry default
Best for: maximum compatibility, constrained compute

H.264 is 20+ years old and still the fallback codec for YouTube, Facebook, and Zoom.

H.265 / HEVC (2013): Better Compression, Licensing Nightmare

Compression: ~37% bitrate savings vs H.264 at equivalent quality
Patents: chaotic and expensive — three patent pools (MPEG LA, HEVC Advance, Velos Media) + many un-pooled patents
Hardware decode: iPhone 7+ (2016), Android 6+ flagships, most 4K TVs
Best for: Apple ecosystem, 4K streaming, bandwidth-sensitive scenarios

The licensing mess is why HEVC adoption on the web was so slow. Chrome and Firefox both resisted adding support.

VP9 (2013): Google’s Free Alternative

By: Google (acquired On2 Technologies)
Compression: close to H.265
Patents: royalty-free
Primary use: YouTube, Google Meet
Caveat: iOS does not natively support VP9

AV1 (2018): The Royalty-Free Next Generation

By: AOMedia Alliance (Google, Netflix, Meta, Amazon, Cisco, Microsoft, Intel, Apple, and more)
Compression: ~53% savings vs H.264, ~25% better than H.265
Patents: royalty-free
Hardware decode: iPhone 15 Pro+ (2023), Pixel 6+, Snapdragon 8 Gen 2+, Intel Arc, NVIDIA RTX 40+
Encoding speed: early SVT-AV1 implementations dramatically improved; CPU encoding is ~2–5× slower than H.265

Netflix, YouTube, TikTok, and Meta are all moving toward AV1 as the primary codec.

H.266 / VVC (2020): Latest Generation, Not Yet Mainstream

Compression: ~78% savings vs H.264, ~25–30% better than AV1
Hardware: flagships starting in 2024; consumer coverage still low
Status: wait and see

Comparison Table

	H.264	H.265	VP9	AV1	H.266
Year	2003	2013	2013	2018	2020
Savings vs H.264	baseline	37%	~30%	53%	78%
Encode speed	Fastest	Mid	Mid	Slow	Slowest
Decode load	Lightest	Mid	Mid	Higher	Heavy
Compatibility	Universal	Very good	Good (web)	Growing	Low
Royalties	Paid	High + messy	Free	Free	Paid

These percentages come from specific test sets and conditions (BD-rate with VMAF/PSNR). Real-world results vary significantly by content type (animation vs. live action vs. screen recording). Don’t use them as absolute marketing claims.

How to Choose a Codec in Practice

Where do your users watch?
│
├── ① Web browsers + all phones + legacy set-top boxes
│     → Must have H.264 (fallback)
│     → Add H.265 (iOS) + AV1 (Android flagships / modern Chrome) for bandwidth savings
│
├── ② Native app only (iOS + Android + optional TV)
│     → H.264 + H.265 as primary; AV1 gradual rollout by device capability
│
├── ③ Web-first, global bandwidth cost matters (YouTube/Netflix scale)
│     → AV1 primary + H.264 fallback
│
└── ④ 4K / HDR premium content
      → H.265 / AV1 + Dolby Vision

A Typical Short-Form Video Encoding Ladder

Tier	Resolution	H.264 bitrate	H.265 bitrate	AV1 bitrate
Low	360p	500 kbps	350 kbps	250 kbps
Mid	540p	900 kbps	650 kbps	480 kbps
Main	720p	1.5 Mbps	1.0 Mbps	750 kbps
High	1080p	3.5 Mbps	2.2 Mbps	1.6 Mbps

Hands-On: Compress a Video with ffmpeg

Basic H.264

ffmpeg -i input.mov -c:v libx264 output.mp4

CRF Quality Control

ffmpeg -i input.mov \
  -c:v libx264 -preset medium -crf 23 \
  -c:a aac -b:a 128k \
  output.mp4

Parameter	Meaning
`-preset medium`	Speed/compression trade-off. Options: ultrafast → veryslow. Slower = smaller file at same quality
`-crf 23`	Quality target, 0–51. Lower = better quality, larger file. Default is 23.
`-c:a aac -b:a 128k`	Audio: AAC at 128 kbps

CRF quick reference: 18 = visually lossless, 23 = high quality, 28 = acceptable (visible compression).

Stream-Ready VOD Encoding

ffmpeg -i input.mov \
  -c:v libx264 -preset slow -crf 22 \
  -profile:v high -level 4.0 \
  -g 60 -keyint_min 60 -sc_threshold 0 \
  -c:a aac -b:a 128k \
  -movflags +faststart \
  output.mp4

Parameter	Why
`-g 60 -keyint_min 60`	One I-frame every 60 frames. At 30 fps = 2-second GOP, aligns with segmentation.
`-sc_threshold 0`	Disable scene-cut auto I-frame insertion. Ensures all bitrate tiers have I-frames at the same positions.
`-movflags +faststart`	Move the MP4 “table of contents” (moov box) to the start of the file — enables progressive playback. See Part 4.
`-profile:v high -level 4.0`	Compatibility: Level 4.0 supports up to 1080p30.

H.265 Encoding

ffmpeg -i input.mov \
  -c:v libx265 -preset medium -crf 26 \
  -tag:v hvc1 \
  -c:a aac -b:a 128k \
  -movflags +faststart \
  output_hevc.mp4

Note: H.265 CRF values need to be ~3–5 higher than H.264 for equivalent visual quality (CRF 26 ≈ H.264 CRF 22). The -tag:v hvc1 tag is required for Apple devices to recognize the file.

Compression Results (1-min 4K source, ~2 GB raw)

Command	Output size	Ratio
Uncompressed YUV	~2 GB	100%
H.264 CRF 23	~30 MB	1.5%
H.264 CRF 18	~80 MB	4%
H.265 CRF 26	~18 MB	0.9%
AV1 (SVT-AV1 preset 8)	~12 MB	0.6%

100–500× compression, virtually indistinguishable on a phone screen.

Per-Title and Per-Shot Encoding

The default approach is a fixed bitrate ladder for all content. But:

A cartoon (flat colors, little detail) looks great at 1080p@1000k
A concert (flashing lights, fast motion) still shows compression at 1080p@5000k

Per-Title Encoding (Netflix, 2015): calculate the optimal bitrate ladder for each title based on its visual complexity.

Traditional: Same ladder for every movie
  360p@500k / 720p@1500k / 1080p@4000k

Per-Title: Custom ladder per movie
  Cartoon:  360p@300k / 720p@800k / 1080p@1800k   (saves money)
  Concert:  360p@700k / 720p@2200k / 1080p@5500k  (needs more bits)

Per-Shot Encoding (Netflix, 2018) goes further: split a movie by scene cuts, then optimize each shot independently using VMAF as the quality target. Claims 17% additional savings at equal quality.

For most platforms, cloud providers’ “smart transcoding” templates (AWS MediaConvert QVBR, Alibaba Cloud “Narrowband HD”) deliver most of the benefit without building this yourself.

Hardware vs. Software Encoding

Approach	Implementation	Speed	Quality	Best for
Software (CPU)	libx264 / libx265 / SVT-AV1	Slow	Best	VOD offline transcoding
Hardware (GPU/ASIC)	NVIDIA NVENC, Intel QSV, Apple VideoToolbox	5–50× faster	Slightly lower	Live streaming, real-time

VOD should use software encoding: you only encode once, but the bandwidth savings last forever. Live streaming must use hardware encoding — you can’t spend 10 seconds encoding 1 second of video.

VMAF, PSNR, SSIM: Measuring Visual Quality

Metric	Full name	Method	Range	Correlation with human perception
PSNR	Peak Signal-to-Noise Ratio	Pixel-level difference	0–∞ dB (higher = better)	Weak
SSIM	Structural Similarity	Luminance/contrast/structure	0–1 (higher = better)	Medium
VMAF	Video Multi-Method Assessment Fusion	ML fusion of multiple features	0–100 (higher = better)	Strong

VMAF (open-sourced by Netflix) is the industry standard for perceptual quality:

VMAF ≥ 93: Visually lossless
VMAF ≈ 80: High quality
VMAF ≈ 60: Acceptable
VMAF ≤ 40: Obvious compression artifacts

Key Takeaways

Video compression achieves <1% of raw size through intra-frame (compress each image) + inter-frame (store only differences) compression.
Encoding is slow, decoding is fast. Encoder + Decoder = Codec.
H.264 is the universal fallback. H.265 saves 37% but has messy licensing. AV1 is free and saves 53%. VVC is the future but not ready yet.
VOD transcoding essentials: CRF quality control, GOP alignment, faststart, disable scene-cut.
Per-title / per-shot encoding is an advanced optimization; cloud “smart transcoding” covers most gains.
VMAF is the industry-standard quality metric.

Previous: Part 1: Video Fundamentals

Next: Part 3: Audio Fundamentals

References

H.264: Advanced video coding — ITU-T
H.265: High efficiency video coding — ITU-T
Alliance for Open Media (AV1) — AOMedia
VMAF — perceptual video quality metric — Netflix / GitHub