VOD Deep Dive Part 5: Streaming Protocols — How HLS and DASH Actually Work

Why progressive download fails, how HLS two-level manifests and DASH MPD work, CMAF dual-manifest best practices, LL-HLS for low latency, and when to consider WebRTC.

zhuermu · · 25 min
vodstreaminghlsdashcmafll-hls

This is Part 5 of the VOD Streaming Deep Dive series.


Why Can’t You Just Download an MP4?

The simplest approach: put video.mp4 on an HTTP server and let users download it. This is progressive download. Fatal flaws:

  1. Without faststart, the entire file must download before playback (see Part 4).
  2. When bandwidth drops, the player can’t switch to a lower quality — it just buffers.
  3. Seeking uses HTTP Range requests on a single large file — CDN cache-unfriendly.
  4. You can’t serve different versions to different devices (an old phone gets the same 1080p as a 4K TV).

The solution: split the video into small segments, write a “directory file” that tells the player what to download next. That’s a streaming protocol.


The Core Idea

Modern streaming has three components:

1. Cut the video into small segments
   ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ...
   │s1│ │s2│ │s3│ │s4│ │s5│
   └──┘ └──┘ └──┘ └──┘ └──┘
   Each 2-6 seconds

2. Produce a set of segments for each quality level
   360p:  ┌──┐ ┌──┐ ┌──┐ ...
   720p:  ┌──┐ ┌──┐ ┌──┐ ...
   1080p: ┌──┐ ┌──┐ ┌──┐ ...

3. Write a manifest telling the player where to find everything
   manifest:
     "360p, 720p, and 1080p available"
     "Each tier: seg_001.m4s through seg_100.m4s"

The player reads the manifest, then:

  • Second 1: Picks a conservative tier and starts downloading
  • Second 2: Measures actual throughput, decides the next segment’s tier
  • Continuously: Upgrades on good bandwidth, downgrades on bad, drops further on stall

This decision process is called Adaptive Bitrate (ABR) — covered in Part 6. This chapter covers the protocol layer.


HLS (HTTP Live Streaming)

Creator: Apple, 2009, shipped with iOS 3.0.

Standard: IETF RFC 8216 (updated in draft-pantos-hls-rfc8216bis).

Two-Level M3U8 Playlists

M3U8 is a UTF-8 text file (extended M3U format). HLS uses two levels:

         Master Playlist              Media Playlist
         (top-level directory)        (per-tier segment list)
         master.m3u8

         ├─► 360p/index.m3u8  ──► 360p/seg1.m4s, seg2.m4s ...

         ├─► 720p/index.m3u8  ──► 720p/seg1.m4s, seg2.m4s ...

         └─► 1080p/index.m3u8 ──► 1080p/seg1.m4s, seg2.m4s ...

Master Playlist Example

#EXTM3U
#EXT-X-VERSION:7

#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360,CODECS="avc1.640016,mp4a.40.2"
360p/index.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720,CODECS="avc1.64001f,mp4a.40.2"
720p/index.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
1080p/index.m3u8

Key fields:

  • BANDWIDTH: Maximum bitrate for this tier (required)
  • RESOLUTION: Pixel dimensions
  • CODECS: RFC 6381 codec string — tells the browser what decoder to use

Media Playlist Example (fMP4, VOD)

#EXTM3U
#EXT-X-VERSION:7
#EXT-X-TARGETDURATION:4
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-MAP:URI="init.mp4"

#EXTINF:4.000,
seg_00001.m4s
#EXTINF:4.000,
seg_00002.m4s
#EXTINF:4.000,
seg_00003.m4s
...
#EXTINF:3.120,
seg_00150.m4s

#EXT-X-ENDLIST
  • EXT-X-TARGETDURATION:4: Maximum segment duration is 4 seconds
  • EXT-X-PLAYLIST-TYPE:VOD: VOD mode (vs EVENT or LIVE)
  • EXT-X-MAP:URI="init.mp4": fMP4 init segment location
  • EXTINF:4.000,: Next segment is 4 seconds long
  • EXT-X-ENDLIST: Playlist is complete (required for VOD; absent in live)

Multi-Audio and Subtitles

#EXTM3U
#EXT-X-VERSION:7

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="en",NAME="English",DEFAULT=YES,URI="audio_en/index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="zh",NAME="中文",URI="audio_zh/index.m3u8"

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",LANGUAGE="en",NAME="English",URI="subs_en/index.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",LANGUAGE="zh",NAME="中文",URI="subs_zh/index.m3u8"

#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720,CODECS="avc1.64001f,mp4a.40.2",AUDIO="audio",SUBTITLES="subs"
720p/index.m3u8

HLS strengths: native on all iOS/Safari, HTTPS-based (firewall-friendly), human-readable text format, highest global market share.


DASH (Dynamic Adaptive Streaming over HTTP)

Creator: MPEG (ISO/IEC 23009-1), standardized 2012. Designed as an open standard alternative to Apple’s proprietary HLS.

The manifest is an XML file: .mpd (Media Presentation Description).

Simplified MPD Example

<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
     type="static"
     mediaPresentationDuration="PT10M30S"
     minBufferTime="PT2S">

  <Period>
    <AdaptationSet mimeType="video/mp4" codecs="avc1.64001f">
      <Representation id="360p" bandwidth="800000" width="640" height="360">
        <BaseURL>360p/</BaseURL>
        <SegmentTemplate initialization="init.mp4"
                         media="seg_$Number$.m4s"
                         timescale="1000"
                         duration="4000"
                         startNumber="1"/>
      </Representation>
      <Representation id="720p" bandwidth="2500000" width="1280" height="720">
        <BaseURL>720p/</BaseURL>
        <SegmentTemplate initialization="init.mp4"
                         media="seg_$Number$.m4s"
                         timescale="1000"
                         duration="4000"
                         startNumber="1"/>
      </Representation>
    </AdaptationSet>

    <AdaptationSet mimeType="audio/mp4" codecs="mp4a.40.2" lang="en">
      <Representation id="audio_en" bandwidth="128000">
        <BaseURL>audio_en/</BaseURL>
        <SegmentTemplate initialization="init.mp4"
                         media="seg_$Number$.m4s"
                         duration="4000"/>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Key concepts:

  • Period: The entire presentation. Movies typically have one; ad-inserted content has multiple.
  • AdaptationSet: A media type (video / audio / subtitles / specific language).
  • Representation: A specific version within an AdaptationSet (360p, 720p, 1080p).
  • SegmentTemplate: Template-based segment URLs — more compact than listing every segment.

HLS vs DASH

HLSDASH
Manifest formatM3U8 (text)MPD (XML)
Segment containerTS / fMP4 (modern)fMP4 / WebM
iOS/SafariNativeNeeds MSE (JS)
AndroidSupportedSupported
Web (Chrome/Firefox)Via hls.jsVia dash.js / Shaka
Open standardIETF standardizingISO/IEC standard

CMAF + Dual Manifest: Industry Best Practice

As covered in Part 4, CMAF lets HLS and DASH share the same fMP4 segments.

Typical directory structure:

/vod/episode-01/
  init.mp4              ← CMAF init segment
  seg_00001.m4s         ← Shared video data
  seg_00002.m4s
  seg_00003.m4s
  ...

  hls/
    master.m3u8         ← HLS manifest (references shared segments)
    360p/index.m3u8
    720p/index.m3u8

  dash/
    manifest.mpd        ← DASH manifest (references same segments)
  • iPhone user → gets master.m3u8 → downloads seg_*.m4s
  • Android user → gets manifest.mpd → downloads the same seg_*.m4s

CDN cache hit rate maximized.


The Latency Problem: Why Live Streams Lag 30 Seconds

Traditional HLS/DASH has 20–30 seconds of startup latency:

Encoder:   [produce 6s segment]──────►
HLS spec:  Client waits for 3 segments before playing = 3 × 6 = 18s
Add:       Playback buffer + network = 25-30s total

For live sports, e-commerce live, and interactive streaming, this is too much.

LL-HLS (Low-Latency HLS)

Apple’s 2019 solution. Target: end-to-end < 2 seconds.

Key techniques:

  1. Partial Segments: Split 6-second segments into 200–500ms “parts” so the player gets data earlier.
  2. Blocking Playlist Reload: Player requests with ?_HLS_msn=X&_HLS_part=Y, server holds the response until new data is available (like long polling).
  3. Preload Hint: Tells the player what’s coming next.

LL-DASH / CMAF-LL

DASH achieves the same via HTTP Chunked Transfer Encoding — the encoder pushes each CMAF chunk (~300ms) immediately without waiting for the full segment.

Latency Comparison

ProtocolTypical latencyComplexity
Traditional HLS (TS, 6s × 3)20–30 secLow
Modern HLS (fMP4, 4s × 3)8–15 secLow
LL-HLS / CMAF-LL2–5 secMedium
WebRTC<500msHigh

WebRTC: A Different Road

WebRTC is not an upgrade to HLS/DASH — it’s fundamentally different technology:

HLS/DASHWebRTC
TransportHTTP/HTTPSUDP + DTLS + SRTP
CDN-friendlyYes (HTTP caching)No (peer-to-peer or dedicated SFU)
Latency2–30 sec<500ms
ScaleMillions concurrent easilyLimited by SFU capacity
Use caseMovies, VOD, one-to-many liveVideo calls, interactive, cloud gaming

VOD doesn’t use WebRTC. This series focuses on HLS/DASH/CMAF.


Protocol Selection Guide

What are you building?

├── ① Pure VOD
│     → HLS + DASH dual manifest + CMAF fMP4

├── ② Standard live streaming (<10s latency OK)
│     → HLS + DASH + CMAF fMP4

├── ③ Low-latency live (sports, e-commerce, <3s)
│     → LL-HLS + LL-DASH + CMAF-LL

├── ④ Ultra-low latency (interactive, <500ms)
│     → WebRTC or RTMP-over-QUIC

└── ⑤ IPTV (carrier set-top boxes)
      → MPEG-TS over UDP/HTTP

Hands-On: Generate HLS + DASH with ffmpeg and Shaka Packager

Single-tier HLS (fMP4)

ffmpeg -i input.mp4 \
  -c:v libx264 -preset slow -crf 22 -g 96 -keyint_min 96 -sc_threshold 0 \
  -c:a aac -b:a 128k \
  -f hls \
  -hls_time 4 \
  -hls_segment_type fmp4 \
  -hls_playlist_type vod \
  -hls_list_size 0 \
  -hls_segment_filename "hls/seg_%04d.m4s" \
  hls/index.m3u8

Multi-bitrate HLS (three tiers in one command)

ffmpeg -i input.mp4 \
  -filter_complex "[0:v]split=3[v1][v2][v3]; \
    [v1]scale=640:360[v1out]; \
    [v2]scale=1280:720[v2out]; \
    [v3]scale=1920:1080[v3out]" \
  -map "[v1out]" -c:v:0 libx264 -b:v:0 800k -maxrate:v:0 850k -bufsize:v:0 1600k \
  -map "[v2out]" -c:v:1 libx264 -b:v:1 2500k -maxrate:v:1 2650k -bufsize:v:1 5000k \
  -map "[v3out]" -c:v:2 libx264 -b:v:2 5000k -maxrate:v:2 5300k -bufsize:v:2 10000k \
  -map 0:a -c:a aac -b:a 128k \
  -g 96 -keyint_min 96 -sc_threshold 0 \
  -f hls -hls_time 4 -hls_segment_type fmp4 -hls_playlist_type vod -hls_list_size 0 \
  -master_pl_name master.m3u8 \
  -var_stream_map "v:0,a:0 v:1,a:0 v:2,a:0" \
  "hls/v%v/index.m3u8"

Production-grade: Shaka Packager for CMAF + HLS + DASH

packager \
  in=input_360p.mp4,stream=video,init_segment=cmaf/v0/init.mp4,segment_template=cmaf/v0/seg_\$Number\$.m4s \
  in=input_720p.mp4,stream=video,init_segment=cmaf/v1/init.mp4,segment_template=cmaf/v1/seg_\$Number\$.m4s \
  in=input_1080p.mp4,stream=video,init_segment=cmaf/v2/init.mp4,segment_template=cmaf/v2/seg_\$Number\$.m4s \
  in=input_720p.mp4,stream=audio,init_segment=cmaf/a0/init.mp4,segment_template=cmaf/a0/seg_\$Number\$.m4s \
  --segment_duration 4 \
  --hls_master_playlist_output=cmaf/master.m3u8 \
  --mpd_output=cmaf/manifest.mpd

Output: one set of segments, both HLS and DASH manifests.


Key Takeaways

  1. Streaming = segments + manifest + client-driven on-demand fetching.
  2. HLS (Apple) uses M3U8 text manifests; DASH (MPEG) uses MPD XML.
  3. iOS/Safari natively supports only HLS; other platforms support both.
  4. CMAF lets HLS and DASH share one set of fMP4 files — the industry best practice.
  5. Traditional HLS latency is 20–30s; LL-HLS / CMAF-LL achieves 2–5s.
  6. WebRTC is a separate path (<500ms) — not for VOD.
  7. In production, use Shaka Packager or cloud services (MediaPackage) for packaging.

Previous: Part 4: Container Formats

Next: Part 6: Adaptive Bitrate Streaming