Free study path · 4 weeks · interactive

Mastering Apache Spark.

A documentation-driven study path through how Apache Spark really works. Four weeks of long-form, source-grounded readings paired with interactive visualizations you can manipulate to see Spark's internals in action.

1 weeks~8 hours of reading26 interactive widgetsQuiz-gated progression

Start Week 1

One email to unlock everything. Used to remember your progress across devices.

What this study path is, and isn't

This is not a fast-paced bootcamp, and it is not a video lecture series. It is a serious, documentation-driven written study path — closer to working through a well-written book than watching tutorials. Each week pairs roughly 70–90 minutes of careful reading with four or five interactive visualizations that let you manipulate Spark internals directly: change executor counts and watch parallelism shift, drag transformations into a pipeline and see stages form at shuffle boundaries, kill an executor and watch the lineage walk back to rebuild a lost partition.

Every concept is grounded in the official Apache Spark documentation (spark.apache.org/docs/latest). When the docs say something, that's what you'll read. When the docs are quiet on a topic that matters, you'll see the trade-offs spelled out explicitly. No fluff, no hype, no marketing — just how Spark works.

At the end of each week is a 15-question self-check quiz. Score 80% to unlock the next week. Wrong answers link back to the exact section of the reading that explains the concept — so "getting it wrong" becomes the most efficient way to find the gap in your understanding.

The weeks

Week 1495m read · 26 widgets · 104-question quiz
Foundation
Seven days on the core Spark execution model — how code becomes work on the cluster.
- Week 1 walks through the runtime mechanics that everything else depends on. By the end you can read a Spark UI, predict where shuffles will happen, explain why DataFrames are usually faster than RDDs, and tell client mode from cluster mode by what they break.

Why interactive visualizations

Most Spark tutorials hand you code. That's useful when you already know what you should expect to see. It's much less useful when you're trying to build the intuition for why a shuffle is expensive, or why one task in a stage holds up everyone else, or what the difference between repartition and coalesce actually looks like inside the cluster.

The widgets in this course are designed to give you that intuition before you ever open a Spark UI. Each one strips away the runtime noise and lets you drive a single concept directly. Move the executors slider, the parallelism number updates immediately. Pick wider partitions, watch the shuffle file count grow. Mark an operation as wide, see the stage boundary appear. The point isn't to simulate Spark — it's to make the mental model so concrete that the real Spark UI becomes legible at a glance.

Who this is for

Data engineers who use Spark daily and want to fill in the gaps that cargo-cult Stack Overflow answers have left in their mental model.
Software engineers crossing into data who already know how distributed systems work in general but want to understand Spark's specific choices.
Senior engineers tuning slow jobs who can read a Spark UI but can't always tell which knob to turn first.
People preparing for data engineering interviews where the conversation drifts to "okay, explain how reduceByKey actually works under the hood."

How the gating works

You unlock the course with an email address. That's the entire payment. The email is used to remember your progress across devices — so you can read on a laptop and quiz on a phone — and to occasionally tell you when a new study path goes live. No spam, unsubscribe anytime, and we don't share the address.

Each week is gated by the previous week's quiz. Score 80% (12 out of 15) and the next week unlocks automatically. Score lower and you get to retake — but the wrong answers come back with section links that point at the exact passages of reading that answer each question. That feedback loop is the whole reason for the gate.

Start Week 1: Foundation