VOD Deep Dive Part 3: Audio Fundamentals — Making Sound Small

How digital audio works: sampling rates, bit depth, channels, AAC vs Opus vs Dolby Atmos, multi-language tracks, loudness normalization, and practical ffmpeg recipes.

zhuermu · · 12 min
vodstreamingaudioaacdolby-atmos

This is Part 3 of the VOD Streaming Deep Dive series.


How Sound Becomes Digital

Sound is air vibration — a continuous waveform. Computers can only store numbers, not continuous waves. Two steps are needed:

  1. Sampling: Measure the wave’s height at regular intervals
  2. Quantization: Convert each measurement into a number
Amplitude

 │  ●   ●                     ●    Sample points
 │     ●  ●  ●
 │           ●  ●
 │              ●  ●   ●
 └──────────────────────► Time
     ↑ ↑ ↑ Sample N times per second — N is the "sample rate"

Sample Rate

Unit: Hz (hertz — samples per second)

Sample rateUse case
8 kHzTelephone voice
16 kHzSpeech recognition, VoIP (Zoom/Teams)
22.05 kHzRetro games, AM radio
44.1 kHzCD audio, music preferred
48 kHzVideo industry default (film, streaming, broadcast)
96 kHzHi-fi recording
192 kHzProfessional studio

Nyquist theorem: To reproduce a frequency F, you need a sample rate of at least 2F. Human hearing tops out around 20 kHz, so 44.1/48 kHz is just enough (with a small margin).

For VOD, standardize on 48 kHz. If your source is 44.1 kHz, resample during transcoding with -ar 48000.


Bit Depth

How many bits per sample:

Bit depthLoudness levelsUse case
8-bit256Retro games, telephony
16-bit65,536CD, consumer streaming
24-bit~16.7MProfessional recording
32-bit floatAstronomicalAudio production internal format

Most video audio is 16-bit, 48 kHz.


Channels

A channel is an independent audio track:

ChannelsNameConfigurationUsed in
1.0MonoSingle channelTelephony, old TV
2.0StereoLeft + RightMusic, most video
5.1SurroundFront L + Center + Front R + Rear L + Rear R + LFE (.1 = subwoofer)Cinema, home theater
7.1Surround5.1 + two side channelsPremium home theater
7.1.4Atmos etc.7.1 + 4 overhead channelsDolby Atmos
5.1 surround layout (top-down view):

         FL ──── C ──── FR
              │  🧑  │
              │      │
         SL ──┻━━━━──SR
                LFE

Audio Bitrate: How Many kbps Is Enough?

Audio bitrate is also bits per second, but much smaller than video — typically 5–10% of the video bitrate.

BitratePerceptionTypical use
32 kbpsVoice OK, music brokenExtreme low bandwidth
64 kbpsVoice clear, music passableLow-bitrate scenarios
96 kbpsMusic acceptableBroadcast, YouTube default
128 kbpsMusic sounds goodStreaming default
192 kbpsHigh fidelityPremium music streaming
256 kbpsAudiophile-gradeApple Music
320 kbpsMP3 maximumMusic enthusiasts
Lossless (FLAC)TransparentHi-fi niche

For VOD: stereo AAC at 128 kbps is the correct answer for the vast majority of scenarios.


Major Audio Codecs

AAC (Advanced Audio Coding) — The Streaming Default

  • By: MPEG (same organization behind H.264)
  • Year: 1997
  • Compatibility: every video platform, browser, and phone
  • Variants:
    • AAC-LC (Low Complexity): Most common. HLS/DASH default.
    • HE-AAC (High Efficiency): Better at low bitrates (<64 kbps)
    • HE-AAC v2: HE-AAC + parametric stereo, decent at 48 kbps

MP3 — Retired

Classic but less efficient than AAC. Original patents expired in 2017. No reason to use MP3 in new projects.

Opus — The Web Newcomer

  • Open-source, royalty-free
  • Excellent from 6 kbps (voice) to 510 kbps (music)
  • WebRTC default, used by Discord
  • But HLS/DASH compatibility lags behind AAC; limited iOS/Safari support

Dolby Family — Cinema-Grade

CodecUse case
AC-3 (Dolby Digital)5.1 surround, Blu-ray, legacy HDTV
E-AC-3 / DD+ (Dolby Digital Plus)5.1/7.1, streaming movies
Dolby Atmos (E-AC-3 + JOC or AC-4)Spatial audio, premium platforms

Dolby Atmos on Netflix, Disney+, and Apple TV+ is a hallmark of premium subscriptions.

FLAC / ALAC — Lossless

Lossless compression reduces size by 50–70% while perfectly preserving the original PCM data. Used in Apple Music lossless tier and audiophile contexts. Not practical for video streaming — bitrate is too high.


Multi-Language Audio Tracks

A single video file can carry multiple audio tracks:

MP4 file
├── video track   (H.264)
├── audio track 1 (AAC, English)
├── audio track 2 (AAC, Chinese)
├── audio track 3 (AAC, Japanese)
└── subtitle track (WebVTT)

Streaming protocols (HLS/DASH) support independent audio track delivery — the player only downloads the language the user selected:

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="en",NAME="English",DEFAULT=YES,URI="audio/en/index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="zh",NAME="中文",URI="audio/zh/index.m3u8"

More on this in Part 5: Streaming Protocols.


Loudness Normalization

Ever noticed the volume spike when a commercial cuts in? That’s because different content has wildly different loudness levels.

Loudness normalization adjusts all content to a uniform perceived loudness level (not peak volume).

Common Standards

StandardTarget loudnessUsed by
EBU R128-23 LUFSEuropean broadcast
ATSC A/85-24 LUFSNorth American broadcast
Apple Music / Spotify-14 LUFSMusic streaming
YouTube-14 LUFSDefault
Short-form / mobile-16 to -14 LUFSPhone speaker range

LUFS (Loudness Units Full Scale) is the international standard for perceived loudness.

ffmpeg Loudness Normalization

# Normalize audio to -14 LUFS
ffmpeg -i input.mp4 -af loudnorm=I=-14:TP=-1.5:LRA=11 -c:v copy output.mp4

Hands-On: Inspect and Transcode Audio

Check audio tracks in a video

ffprobe -v error -show_streams -select_streams a input.mp4

Typical output:

codec_name=aac
sample_rate=48000
channels=2
channel_layout=stereo
bit_rate=128000

Standardize to AAC 48 kHz 128 kbps stereo

ffmpeg -i input.mov \
  -c:a aac -b:a 128k -ar 48000 -ac 2 \
  -c:v copy \
  output.mp4
  • -c:a aac: Audio codec AAC
  • -b:a 128k: 128 kbps bitrate
  • -ar 48000: 48 kHz sample rate
  • -ac 2: 2 channels (stereo)
  • -c:v copy: Copy video stream as-is (saves time)

Key Takeaways

  1. Digital audio requires sampling rate (temporal density) and bit depth (amplitude precision).
  2. VOD default: 48 kHz sample rate, 16-bit depth.
  3. Consumer streaming defaults to stereo (2.0); cinema uses 5.1 / Atmos.
  4. AAC-LC at 128 kbps is the default audio setting for VOD.
  5. A single video file can carry multiple audio tracks (multi-language).
  6. Loudness normalization (EBU R128 / -14 LUFS) prevents the “ads are too loud” problem.

Previous: Part 2: Video Codecs

Next: Part 4: Container Formats — MP4, fMP4, and CMAF