VOD Deep Dive Part 5: Streaming Protocols — How HLS and DASH Actually Work
Why progressive download fails, how HLS two-level manifests and DASH MPD work, CMAF dual-manifest best practices, LL-HLS for low latency, and when to consider WebRTC.
This is Part 5 of the VOD Streaming Deep Dive series.
Why Can’t You Just Download an MP4?
The simplest approach: put video.mp4 on an HTTP server and let users download it. This is progressive download. Fatal flaws:
- Without faststart, the entire file must download before playback (see Part 4).
- When bandwidth drops, the player can’t switch to a lower quality — it just buffers.
- Seeking uses HTTP Range requests on a single large file — CDN cache-unfriendly.
- You can’t serve different versions to different devices (an old phone gets the same 1080p as a 4K TV).
The solution: split the video into small segments, write a “directory file” that tells the player what to download next. That’s a streaming protocol.
The Core Idea
Modern streaming has three components:
1. Cut the video into small segments
┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ...
│s1│ │s2│ │s3│ │s4│ │s5│
└──┘ └──┘ └──┘ └──┘ └──┘
Each 2-6 seconds
2. Produce a set of segments for each quality level
360p: ┌──┐ ┌──┐ ┌──┐ ...
720p: ┌──┐ ┌──┐ ┌──┐ ...
1080p: ┌──┐ ┌──┐ ┌──┐ ...
3. Write a manifest telling the player where to find everything
manifest:
"360p, 720p, and 1080p available"
"Each tier: seg_001.m4s through seg_100.m4s"
The player reads the manifest, then:
- Second 1: Picks a conservative tier and starts downloading
- Second 2: Measures actual throughput, decides the next segment’s tier
- Continuously: Upgrades on good bandwidth, downgrades on bad, drops further on stall
This decision process is called Adaptive Bitrate (ABR) — covered in Part 6. This chapter covers the protocol layer.
HLS (HTTP Live Streaming)
Creator: Apple, 2009, shipped with iOS 3.0.
Standard: IETF RFC 8216 (updated in draft-pantos-hls-rfc8216bis).
Two-Level M3U8 Playlists
M3U8 is a UTF-8 text file (extended M3U format). HLS uses two levels:
Master Playlist Media Playlist
(top-level directory) (per-tier segment list)
master.m3u8
│
├─► 360p/index.m3u8 ──► 360p/seg1.m4s, seg2.m4s ...
│
├─► 720p/index.m3u8 ──► 720p/seg1.m4s, seg2.m4s ...
│
└─► 1080p/index.m3u8 ──► 1080p/seg1.m4s, seg2.m4s ...
Master Playlist Example
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360,CODECS="avc1.640016,mp4a.40.2"
360p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720,CODECS="avc1.64001f,mp4a.40.2"
720p/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
1080p/index.m3u8
Key fields:
BANDWIDTH: Maximum bitrate for this tier (required)RESOLUTION: Pixel dimensionsCODECS: RFC 6381 codec string — tells the browser what decoder to use
Media Playlist Example (fMP4, VOD)
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:4
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="init.mp4"
#EXTINF:4.000,
seg_00001.m4s
#EXTINF:4.000,
seg_00002.m4s
#EXTINF:4.000,
seg_00003.m4s
...
#EXTINF:3.120,
seg_00150.m4s
#EXT-X-ENDLIST
EXT-X-TARGETDURATION:4: Maximum segment duration is 4 secondsEXT-X-PLAYLIST-TYPE:VOD: VOD mode (vs EVENT or LIVE)EXT-X-MAP:URI="init.mp4": fMP4 init segment locationEXTINF:4.000,: Next segment is 4 seconds longEXT-X-ENDLIST: Playlist is complete (required for VOD; absent in live)
Multi-Audio and Subtitles
#EXTM3U
#EXT-X-VERSION:7
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="en",NAME="English",DEFAULT=YES,URI="audio_en/index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="zh",NAME="中文",URI="audio_zh/index.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",LANGUAGE="en",NAME="English",URI="subs_en/index.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",LANGUAGE="zh",NAME="中文",URI="subs_zh/index.m3u8"
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720,CODECS="avc1.64001f,mp4a.40.2",AUDIO="audio",SUBTITLES="subs"
720p/index.m3u8
HLS strengths: native on all iOS/Safari, HTTPS-based (firewall-friendly), human-readable text format, highest global market share.
DASH (Dynamic Adaptive Streaming over HTTP)
Creator: MPEG (ISO/IEC 23009-1), standardized 2012. Designed as an open standard alternative to Apple’s proprietary HLS.
The manifest is an XML file: .mpd (Media Presentation Description).
Simplified MPD Example
<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
type="static"
mediaPresentationDuration="PT10M30S"
minBufferTime="PT2S">
<Period>
<AdaptationSet mimeType="video/mp4" codecs="avc1.64001f">
<Representation id="360p" bandwidth="800000" width="640" height="360">
<BaseURL>360p/</BaseURL>
<SegmentTemplate initialization="init.mp4"
media="seg_$Number$.m4s"
timescale="1000"
duration="4000"
startNumber="1"/>
</Representation>
<Representation id="720p" bandwidth="2500000" width="1280" height="720">
<BaseURL>720p/</BaseURL>
<SegmentTemplate initialization="init.mp4"
media="seg_$Number$.m4s"
timescale="1000"
duration="4000"
startNumber="1"/>
</Representation>
</AdaptationSet>
<AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" lang="en">
<Representation id="audio_en" bandwidth="128000">
<BaseURL>audio_en/</BaseURL>
<SegmentTemplate initialization="init.mp4"
media="seg_$Number$.m4s"
duration="4000"/>
</Representation>
</AdaptationSet>
</Period>
</MPD>
Key concepts:
- Period: The entire presentation. Movies typically have one; ad-inserted content has multiple.
- AdaptationSet: A media type (video / audio / subtitles / specific language).
- Representation: A specific version within an AdaptationSet (360p, 720p, 1080p).
- SegmentTemplate: Template-based segment URLs — more compact than listing every segment.
HLS vs DASH
| HLS | DASH | |
|---|---|---|
| Manifest format | M3U8 (text) | MPD (XML) |
| Segment container | TS / fMP4 (modern) | fMP4 / WebM |
| iOS/Safari | Native | Needs MSE (JS) |
| Android | Supported | Supported |
| Web (Chrome/Firefox) | Via hls.js | Via dash.js / Shaka |
| Open standard | IETF standardizing | ISO/IEC standard |
CMAF + Dual Manifest: Industry Best Practice
As covered in Part 4, CMAF lets HLS and DASH share the same fMP4 segments.
Typical directory structure:
/vod/episode-01/
init.mp4 ← CMAF init segment
seg_00001.m4s ← Shared video data
seg_00002.m4s
seg_00003.m4s
...
hls/
master.m3u8 ← HLS manifest (references shared segments)
360p/index.m3u8
720p/index.m3u8
dash/
manifest.mpd ← DASH manifest (references same segments)
- iPhone user → gets
master.m3u8→ downloadsseg_*.m4s - Android user → gets
manifest.mpd→ downloads the sameseg_*.m4s
CDN cache hit rate maximized.
The Latency Problem: Why Live Streams Lag 30 Seconds
Traditional HLS/DASH has 20–30 seconds of startup latency:
Encoder: [produce 6s segment]──────►
HLS spec: Client waits for 3 segments before playing = 3 × 6 = 18s
Add: Playback buffer + network = 25-30s total
For live sports, e-commerce live, and interactive streaming, this is too much.
LL-HLS (Low-Latency HLS)
Apple’s 2019 solution. Target: end-to-end < 2 seconds.
Key techniques:
- Partial Segments: Split 6-second segments into 200–500ms “parts” so the player gets data earlier.
- Blocking Playlist Reload: Player requests with
?_HLS_msn=X&_HLS_part=Y, server holds the response until new data is available (like long polling). - Preload Hint: Tells the player what’s coming next.
LL-DASH / CMAF-LL
DASH achieves the same via HTTP Chunked Transfer Encoding — the encoder pushes each CMAF chunk (~300ms) immediately without waiting for the full segment.
Latency Comparison
| Protocol | Typical latency | Complexity |
|---|---|---|
| Traditional HLS (TS, 6s × 3) | 20–30 sec | Low |
| Modern HLS (fMP4, 4s × 3) | 8–15 sec | Low |
| LL-HLS / CMAF-LL | 2–5 sec | Medium |
| WebRTC | <500ms | High |
WebRTC: A Different Road
WebRTC is not an upgrade to HLS/DASH — it’s fundamentally different technology:
| HLS/DASH | WebRTC | |
|---|---|---|
| Transport | HTTP/HTTPS | UDP + DTLS + SRTP |
| CDN-friendly | Yes (HTTP caching) | No (peer-to-peer or dedicated SFU) |
| Latency | 2–30 sec | <500ms |
| Scale | Millions concurrent easily | Limited by SFU capacity |
| Use case | Movies, VOD, one-to-many live | Video calls, interactive, cloud gaming |
VOD doesn’t use WebRTC. This series focuses on HLS/DASH/CMAF.
Protocol Selection Guide
What are you building?
│
├── ① Pure VOD
│ → HLS + DASH dual manifest + CMAF fMP4
│
├── ② Standard live streaming (<10s latency OK)
│ → HLS + DASH + CMAF fMP4
│
├── ③ Low-latency live (sports, e-commerce, <3s)
│ → LL-HLS + LL-DASH + CMAF-LL
│
├── ④ Ultra-low latency (interactive, <500ms)
│ → WebRTC or RTMP-over-QUIC
│
└── ⑤ IPTV (carrier set-top boxes)
→ MPEG-TS over UDP/HTTP
Hands-On: Generate HLS + DASH with ffmpeg and Shaka Packager
Single-tier HLS (fMP4)
ffmpeg -i input.mp4 \
-c:v libx264 -preset slow -crf 22 -g 96 -keyint_min 96 -sc_threshold 0 \
-c:a aac -b:a 128k \
-f hls \
-hls_time 4 \
-hls_segment_type fmp4 \
-hls_playlist_type vod \
-hls_list_size 0 \
-hls_segment_filename "hls/seg_%04d.m4s" \
hls/index.m3u8
Multi-bitrate HLS (three tiers in one command)
ffmpeg -i input.mp4 \
-filter_complex "[0:v]split=3[v1][v2][v3]; \
[v1]scale=640:360[v1out]; \
[v2]scale=1280:720[v2out]; \
[v3]scale=1920:1080[v3out]" \
-map "[v1out]" -c:v:0 libx264 -b:v:0 800k -maxrate:v:0 850k -bufsize:v:0 1600k \
-map "[v2out]" -c:v:1 libx264 -b:v:1 2500k -maxrate:v:1 2650k -bufsize:v:1 5000k \
-map "[v3out]" -c:v:2 libx264 -b:v:2 5000k -maxrate:v:2 5300k -bufsize:v:2 10000k \
-map 0:a -c:a aac -b:a 128k \
-g 96 -keyint_min 96 -sc_threshold 0 \
-f hls -hls_time 4 -hls_segment_type fmp4 -hls_playlist_type vod -hls_list_size 0 \
-master_pl_name master.m3u8 \
-var_stream_map "v:0,a:0 v:1,a:0 v:2,a:0" \
"hls/v%v/index.m3u8"
Production-grade: Shaka Packager for CMAF + HLS + DASH
packager \
in=input_360p.mp4,stream=video,init_segment=cmaf/v0/init.mp4,segment_template=cmaf/v0/seg_\$Number\$.m4s \
in=input_720p.mp4,stream=video,init_segment=cmaf/v1/init.mp4,segment_template=cmaf/v1/seg_\$Number\$.m4s \
in=input_1080p.mp4,stream=video,init_segment=cmaf/v2/init.mp4,segment_template=cmaf/v2/seg_\$Number\$.m4s \
in=input_720p.mp4,stream=audio,init_segment=cmaf/a0/init.mp4,segment_template=cmaf/a0/seg_\$Number\$.m4s \
--segment_duration 4 \
--hls_master_playlist_output=cmaf/master.m3u8 \
--mpd_output=cmaf/manifest.mpd
Output: one set of segments, both HLS and DASH manifests.
Key Takeaways
- Streaming = segments + manifest + client-driven on-demand fetching.
- HLS (Apple) uses M3U8 text manifests; DASH (MPEG) uses MPD XML.
- iOS/Safari natively supports only HLS; other platforms support both.
- CMAF lets HLS and DASH share one set of fMP4 files — the industry best practice.
- Traditional HLS latency is 20–30s; LL-HLS / CMAF-LL achieves 2–5s.
- WebRTC is a separate path (<500ms) — not for VOD.
- In production, use Shaka Packager or cloud services (MediaPackage) for packaging.
Previous: Part 4: Container Formats