Tech Deep Dive
Source-code-level analysis and system design explorations. Covers databases, search engines, streaming, video technology, and architecture internals.
29 articles
VOD Deep Dive Part 1: Video Fundamentals — What Is a Video, Really?
The first installment of our 12-part VOD streaming series. Learn what video actually is at the byte level — pixels, resolution, frame rates, bitrate, I/P/B frames, GOP, color spaces, and HDR.
VOD Deep Dive Part 10: QoE Metrics — How to Measure What Users Actually Feel
QoE vs QoS, six core metrics (VST, RBR, VSF, EBVS, VPF, Avg Bitrate), data pipelines, multi-dimensional drill-down, troubleshooting cases, and when to buy vs build.
VOD Deep Dive Part 11: End-to-End Workflow — From Upload to Playback
The complete 10-step VOD production pipeline: upload, content moderation, probe, transcode, package, publish, CDN pre-warm, orchestration with Step Functions and Temporal, disaster recovery.
VOD Deep Dive Part 12: Building VOD on AWS — Services, Architecture, and Costs
Complete AWS VOD reference: MediaConvert, MediaPackage, CloudFront, S3, Step Functions, SPEKE DRM integration, Terraform IaC, real cost breakdowns, common pitfalls, and a production roadmap.
VOD Deep Dive Part 2: Video Codecs — Why a 4K Movie Fits in 5 GB
How video compression works, why H.264 still dominates, when to choose H.265 or AV1, per-title encoding, VMAF quality metrics, and hands-on ffmpeg examples.
VOD Deep Dive Part 3: Audio Fundamentals — Making Sound Small
How digital audio works: sampling rates, bit depth, channels, AAC vs Opus vs Dolby Atmos, multi-language tracks, loudness normalization, and practical ffmpeg recipes.
VOD Deep Dive Part 4: Container Formats — .mp4 Is Not a Codec
Containers vs codecs, MP4 internals (Box structure), the faststart trap, fragmented MP4, CMAF for unified HLS+DASH, segment length trade-offs, and subtitle formats.
VOD Deep Dive Part 5: Streaming Protocols — How HLS and DASH Actually Work
Why progressive download fails, how HLS two-level manifests and DASH MPD work, CMAF dual-manifest best practices, LL-HLS for low latency, and when to consider WebRTC.
VOD Deep Dive Part 6: Adaptive Bitrate — How Players Auto-Switch Quality
How ABR works under the hood: throughput-based, buffer-based (BBA), BOLA, MPC, and Pensieve algorithms. Plus practical engineering advice for bitrate ladders and short-form video.
VOD Deep Dive Part 7: CDN Distribution — Why It's Fast Everywhere
CDN architecture (Edge/Shield/Origin), caching strategies, request collapsing, signed URLs, pre-warming, JIT vs pre-packaging, multi-CDN strategies, HTTP/3, and cost estimation.
VOD Deep Dive Part 8: DRM Content Protection — Why Netflix Can't Be Screen-Recorded
Widevine, FairPlay, PlayReady explained. CENC/CBCS unified encryption, license flow, L1/L2/L3 security levels, HDCP, SPEKE integration, and lightweight protection for short-form video.
VOD Deep Dive Part 9: Video Players — From Manifest to First Frame
What happens inside a video player: Web (MSE/EME), iOS (AVPlayer), Android (ExoPlayer/Media3), TTFF optimization, buffering strategies, lip sync, and when to build vs buy.
Subtitle Position Detection with OpenCV and Amazon Nova
A hybrid CV + LLM pipeline for automatic subtitle detection — 6 iterations to reach 83% accuracy on multilingual video.
How AI Coding Agents Actually Work: A Source Code Deep Dive
We traced the source code of Amazon Q CLI and Claude Code to understand how AI coding agents really work under the hood.
OpenClaw vs Claude Code: Architecture and Strategy Compared
Two AI agent products, two radically different philosophies. A deep comparison of architecture, adoption, and what's next.
OpenClaw vs Claude Code Source Code: Two AI Agent Architectures
We compared 453K lines of OpenClaw TypeScript with Claude Code's 28K lines of Markdown. The architectures couldn't be more different.
Big Data on AWS Deep Dive (Part 10): Full Architecture Blueprint and Cost Breakdown
The complete end-to-end architecture for a social app's data warehouse and recommendation system on AWS — every service mapped, with real monthly cost estimates and optimization strategies.
Big Data on AWS Deep Dive (Part 9): SageMaker and the ML Platform — From Training to Production
A complete tour of SageMaker AI: Studio notebooks, Feature Store, Training Jobs, real-time Endpoints, Model Monitor, and how it all fits into the recommendation system MLOps workflow.
Big Data on AWS Deep Dive (Part 8): Online Feature Stores — DynamoDB, ElastiCache, and OpenSearch k-NN
How recommendation systems serve features at inference time: DynamoDB for user features, ElastiCache for hot caching, OpenSearch k-NN for vector recall, and Neptune for graph retrieval.
Big Data on AWS Deep Dive (Part 7): Recommendation System Fundamentals — Funnel, Two-Tower, and PIT
Understand the recommendation system funnel (recall → pre-rank → rank → re-rank), two-tower retrieval architecture, and why Point-in-Time correctness matters for training samples.
Big Data on AWS Deep Dive (Part 6): End-to-End Data Pipeline — From Source to Feature Store
Connect all the dots: trace a click event from client SDK through API Gateway, MSK, Firehose, S3, warehouse layers (ODS→DWD→DWS→ADS), to DynamoDB for real-time serving.
Big Data on AWS Deep Dive (Part 5): EMR, Glue ETL, Flink, and Pipeline Orchestration
Compare EMR Serverless, Glue ETL, Managed Flink, and choose the right compute engine. Then orchestrate data pipelines with MWAA (Airflow) and Step Functions.
Big Data on AWS Deep Dive (Part 4): Glue Catalog, Athena, and Lake Formation
How AWS Glue Data Catalog acts as the central directory for your data lake, and how Athena queries Parquet and Iceberg tables on S3 with serverless SQL.
Big Data on AWS Deep Dive (Part 3): Data Ingestion — DMS, Zero-ETL, Firehose, and MSK
Four data sources, four ingestion pipelines — learn CDC with DMS, Aurora Zero-ETL, Kafka on MSK, and Firehose micro-batching to land data into your S3 data lake.
Big Data on AWS Deep Dive (Part 2): S3, Parquet, and Apache Iceberg Explained
Master the storage foundation of modern data lakes — S3 object storage, Parquet columnar format, and how Iceberg adds ACID transactions to files on S3.
Big Data on AWS Deep Dive (Part 1): Data Lakes, Warehouses, and the Lakehouse Revolution
Understand the core big data concepts — data lake vs. data warehouse vs. lakehouse, OLTP vs. OLAP, and why modern analytics architectures converge on S3.
How to Design a Full-Site Search Engine with Elasticsearch
Multi-source indexing, CDC sync, permission-aware search, hot keywords, and typeahead — a complete Elasticsearch architecture guide.
Building a Knowledge Base Search Engine with FSCrawler and Elasticsearch
Index PDFs, Word docs, and scanned files into Elasticsearch with FSCrawler. Covers OCR, custom mappings, and production setup.
Adding a Unique Index to a 15-Million-Row MySQL Table: A Production War Story
We added a unique index to a 15M-row live table and caused an outage. Here's what went wrong and the right way to do it.