Speaker context: Students just finished a week of Spark DataFrames and SQL. Today we introduce Hive—the original SQL-on-Hadoop story. The key conceptual shift: Hive is for batch data warehousing, not interactive analysis. Emphasize the Hive metastore as a shared catalog that Spark, Presto, and Impala all rely on. Assignment 2 is released today—give the 2-minute overview near the start, then dive in. Demo-first, explain-after works well here.
Demo tip: Run this live. Students often have connection issues — if HiveServer2 is slow to start, have them run: docker restart hiveserver2, then wait 30 seconds.
Point out: these queries may take 30-60 seconds on MapReduce backend. That's expected and worth discussing — it's a batch system, not interactive.
Circulate: join syntax, date extraction (SUBSTR or FROM_UNIXTIME), and null-check patterns are common sticking points. Slow query time is expected — reassure students.