allaboutspark
← Mastering Apache Spark
Week 1 · 7 days · ~495 minutes reading · 26 widgets · 104 questions

Foundation

Week 1 walks through the runtime mechanics that everything else depends on. By the end you can read a Spark UI, predict where shuffles will happen, explain why DataFrames are usually faster than RDDs, and tell client mode from cluster mode by what they break.

The 7 days

Day 1
70m · 5 widgets · 15q

Cluster Mode Overview

How Spark works under the hood: drivers, executors, cluster managers, and how your code becomes work on the cluster.

  • The three main players: driver, cluster manager, executors
  • Application → Job → Stage → Task hierarchy
  • … and 4 more
Day 2
80m · 5 widgets · 15q

RDDs — The Foundation

What an RDD really is, the five properties, transformations, actions, lineage, and why fault tolerance is essentially free.

  • What RDD stands for and why immutability matters
  • The five internal properties of every RDD
  • … and 4 more
Coming soon
Day 3
90m · 4 widgets · 15q

Shuffles, Partitioning & Persistence

The performance trio: what really happens during a shuffle, how partitioning controls parallelism, and when caching actually helps.

  • Anatomy of a shuffle: map side, disk, network, reduce side
  • Why shuffles are expensive (network, disk, serialization, GC)
  • … and 6 more
Coming soon
Day 4
80m · 4 widgets · 15q

Shared Variables — Broadcast & Accumulators

The two ways Spark lets the driver and executors share state — and why every other approach silently breaks.

  • The closure problem made concrete
  • Broadcast variables: what they are and how they work
  • … and 4 more
Coming soon
Day 5
60m · 3 widgets · 12q

SparkSession & SparkContext in depth

The two entry points you've been using since Day 1 — relationship, builder options, SparkConf, and the runtime ConfigManager.

  • The SparkSession vs SparkContext relationship
  • Builder options: appName, master, config
  • … and 3 more
Coming soon
Day 6
70m · 3 widgets · 12q

spark-submit & Application Deployment

How you actually run things in production: spark-submit flags, packaging, dependency management, and the operational mechanics of deploys.

  • spark-submit flags in depth
  • Packaging applications and dependencies
  • … and 3 more
Coming soon
Day 7
45m · 2 widgets · 20q

Week 1 review & consolidation

Bring the foundation phase together — review the cluster model end-to-end and consolidate what you've learned about RDD-level Spark.

  • End-to-end cluster execution walkthrough
  • Recurring patterns across Days 1–6
  • … and 2 more
Coming soon