Speaker context: This session transitions from MapReduce theory to hands-on coding practice using mrjob, a Pythonic library that simplifies MapReduce development. The audience includes students from CS, business, and math backgrounds—all with SQL experience but varying programming comfort levels. We'll leverage their SQL intuition (GROUP BY, aggregation) while introducing mrjob's elegant class-based approach. Emphasize local testing workflows to build confidence before cluster submission.
Speaker notes: Many students with business/math backgrounds are more comfortable with Python than Java. mrjob provides a Pythonic API that feels natural while hiding Hadoop complexity. Emphasize portability—write once, run anywhere.
Speaker notes: Show live installation. Mention that Docker environment has mrjob pre-installed. For students' personal machines, they can use pip install --user if they lack admin rights.
Speaker notes: Live code wordcount.py step-by-step. Start with basic class structure, add mapper, then reducer. Emphasize how much cleaner this is than stdin/stdout scripts. For business students, relate class methods to defining functions in Excel VBA or SQL procedures.
Speaker notes: Live demo YARN UI navigation. Show completed job counters. For business students, relate to SQL query execution plans or ETL pipeline dashboards.
Speaker notes: Critical hands-on demo. Go slowly, explain each step before executing. Pause to let students catch up. Point out expected timing (small file = 30-60 seconds). Show mrjob's progress output in terminal.