Big Data on AWS Deep Dive
A 10-part series covering everything you need to build a production data warehouse and recommendation system on AWS — from OLTP vs. OLAP basics to a full end-to-end architecture with cost estimates.
Who Is This For?
Backend engineers and data engineers who know SQL and have used a relational database, but haven't built a data lake or recommendation system before. No Hadoop experience required — we start from scratch and build up to a production architecture on AWS.
Part 1: Big Data Fundamentals
Data lake vs. warehouse vs. lakehouse, OLTP vs. OLAP, batch vs. stream
Part 2: Storage & File Formats
S3 object storage, Parquet columnar format, Apache Iceberg internals
Part 3: Data Ingestion
DMS CDC, Aurora Zero-ETL, Firehose micro-batching, MSK / Kafka
Part 4: Metadata & Query Engines
Glue Data Catalog, Athena serverless SQL, Lake Formation permissions
Part 5: Compute & Orchestration
EMR Serverless, Glue ETL, Managed Flink, MWAA, Step Functions
Part 6: End-to-End Data Pipeline
From client click → MSK → S3 → ODS → DWD → DWS → ADS → DynamoDB
Part 7: Recommendation Fundamentals
Funnel architecture, two-tower recall, PIT training correctness
Part 8: Online Feature Stores
DynamoDB, ElastiCache, OpenSearch k-NN, Neptune graph recall
Part 9: SageMaker & ML Platform
Studio, Feature Store, Training Jobs, Endpoints, Model Monitor
Part 10: Full Architecture & Costs
Complete blueprint, every AWS service mapped, monthly cost breakdown