Speaker context: This session is ~15 min theory (partitioning/bucketing), ~15 min Hive optimization demo, then ~30 min of Apache Pig including live Grunt shell demos. Students should leave able to write basic Pig Latin scripts and understand when to pick Hive vs. Pig vs. Spark SQL. After the 3-way comparison, close with the historical "dataflow spectrum" context and a brief Pig vs. Beam comparison — this sets up the intellectual thread that runs from Pig → Spark → Beam. Assignment 2 due Sunday — remind at the start.
ILLUSTRATE is a killer debugging tool — it traces a few sample rows through the entire script. Show this prominently.
Emphasize: Pig didn't fail — it succeeded so well that its core ideas were absorbed into every modern big-data tool. Spark's RDD transformations ARE Pig Latin with a Python syntax.