Kafka in Practice

Give 2 minutes for environment check. Identify anyone with issues before starting the CLI demo. The most common issue is ZooKeeper not being ready — docker compose logs zookeeper will show the status.

5 minutes. Emphasize: user_123 appears twice — both records go to the same partition because the partition key is deterministic. Ask students: why would that matter for order processing?

5 minutes. The consumer groups describe output shows the LAG column — how far behind a consumer is. This is the production metric for monitoring consumer health.

linger.ms=5 enables micro-batching — the producer waits 5ms to collect more messages before sending a batch. This improves throughput at the cost of a tiny latency increase.

Walk through each argument to produce(). Students often forget flush() and wonder why some deliveries never get confirmed.

The critical line is enable.auto.commit=False. Without it, Kafka advances offsets on a timer regardless of processing success. With it, we own the commit decision.

This is the core pattern. Draw the timeline: poll → process → commit → poll. Ask: "What if the database write in the processing step fails? What do we want to happen?" They should say: reprocess the batch.

15 minutes. Students often expect 33-33-34 distribution across 3 partitions. In practice, the hash of each user_id is not perfectly balanced — one partition usually gets more. This is a real-world lesson about partition key selection.

20 minutes. The key moment is Run 2 — students see that the consumer picks up exactly where it left off. This is the "committed offset as a bookmark" concept made tangible.

15 minutes. Set session.timeout.ms=10000 in both consumer configs so rebalancing happens in 10 seconds instead of 45 — makes the demo visible.

10 minutes. Expected answer: the batch is redelivered on restart because commit never happened. The LAG column will confirm uncommitted messages. This is the core insight of at-least-once: the batch is the unit of commit, not the individual message.

2-minute checkpoint before debrief. If most students are still on Task 3, skip Task 4 — explain at-least-once redelivery in the debrief instead. If stuck, check: docker compose logs kafka-broker --tail=50

5 minutes. Ask: "If you were building the billing service, how would you make it safe against the Task 4 failure scenario?" They should say: upsert by event_id so duplicate processing produces the same result.