Foundation
Week 1 walks through the runtime mechanics that everything else depends on. By the end you can read a Spark UI, predict where shuffles will happen, explain why DataFrames are usually faster than RDDs, and tell client mode from cluster mode by what they break.
The 7 days
Cluster Mode Overview
How Spark works under the hood: drivers, executors, cluster managers, and how your code becomes work on the cluster.
- The three main players: driver, cluster manager, executors
- Application → Job → Stage → Task hierarchy
- … and 4 more
RDDs — The Foundation
What an RDD really is, the five properties, transformations, actions, lineage, and why fault tolerance is essentially free.
- What RDD stands for and why immutability matters
- The five internal properties of every RDD
- … and 4 more
Shuffles, Partitioning & Persistence
The performance trio: what really happens during a shuffle, how partitioning controls parallelism, and when caching actually helps.
- Anatomy of a shuffle: map side, disk, network, reduce side
- Why shuffles are expensive (network, disk, serialization, GC)
- … and 6 more
Shared Variables — Broadcast & Accumulators
The two ways Spark lets the driver and executors share state — and why every other approach silently breaks.
- The closure problem made concrete
- Broadcast variables: what they are and how they work
- … and 4 more
SparkSession & SparkContext in depth
The two entry points you've been using since Day 1 — relationship, builder options, SparkConf, and the runtime ConfigManager.
- The SparkSession vs SparkContext relationship
- Builder options: appName, master, config
- … and 3 more
spark-submit & Application Deployment
How you actually run things in production: spark-submit flags, packaging, dependency management, and the operational mechanics of deploys.
- spark-submit flags in depth
- Packaging applications and dependencies
- … and 3 more
Week 1 review & consolidation
Bring the foundation phase together — review the cluster model end-to-end and consolidate what you've learned about RDD-level Spark.
- End-to-end cluster execution walkthrough
- Recurring patterns across Days 1–6
- … and 2 more