Speaker context: Session 1 gave students the ring architecture and the ALLOW FILTERING error. Today answers "so how do I design a schema that doesn't need ALLOW FILTERING?" — the query-first modeling section is the heart of this session. The three-way comparison at the end is extremely high-value for the final exam and job interviews. The ScyllaDB section is short (~10 min, conceptual) — its main point is that CQL is identical, so "drop-in replacement" is real. Close with the NoSQL selection activity to cement the decision framework.
Let this land before moving on. The failure isn't a bug — it's the database enforcing its data model. Today explains why, and what to do about it.
This framing — "tables are answers" — is the single most important mental model shift of the week. Repeat it verbatim when students ask why they can't just add a WHERE clause.
Quick verbal check: ask the room "what happens if a node is down when a write arrives?" Expect: hinted handoff. If blank stares, spend 2 minutes here before proceeding — the rest of the session depends on this foundation. (~3 minutes)
Emphasize the p99 framing. p50 looks fine on dashboards; p99 is what users actually experience on the unlucky request. SLA contracts are almost always written in p99. The GC problem is why every major Cassandra user (Discord, Netflix) either runs massive clusters with excess headroom or moved to ScyllaDB. (~5 minutes)
The key insight is "shared-nothing per core" — contrast with Cassandra's thread pool model where cores contend on shared data structures. Students with OS background will recognize this as the classic lock-contention problem. You don't need to go deep — the takeaway is "C++ + no GC + no cross-core contention = dramatically better tail latency." (~4 minutes)
Demo point: run the exact same CQL from Session 1 on ScyllaDB — it works unchanged. The migration is a data migration (backup/restore), not a code change. Stress "identical CQL" — students often assume switching databases means rewriting application code. Here, the driver configuration changes; the queries do not. (~3 minutes)
Demo point: run the exact same CQL from Session 1 on ScyllaDB — it works unchanged. The migration is a data migration (backup/restore), not a code change. Stress "identical CQL" — students often assume switching databases means rewriting application code. Here, the driver configuration changes; the queries do not. (~3 minutes)
This is the conceptual turning point. Students trained on relational design will push back: "but that's duplication!" Yes. Intentional duplication. The quote at the bottom is worth reading aloud. In Cassandra, storage cost is the explicit trade-off you make for O(1) reads. Ask: "what happens in SQL if a query changes?" (Add an index, maybe rewrite). "In Cassandra?" (Design a new table, rewrite the app). Both have costs — Cassandra front-loads them at design time. (~6 minutes)
Walk through the "never low-cardinality" rule with a concrete example: if you partition orders by `status` (pending/shipped/delivered), all "pending" orders land on the same token range → one or two nodes handle all pending-order traffic. That's a hotspot. The fix is always to add a high-cardinality column to the partition key. (~5 minutes)
Walk through the "never low-cardinality" rule with a concrete example: if you partition orders by `status` (pending/shipped/delivered), all "pending" orders land on the same token range → one or two nodes handle all pending-order traffic. That's a hotspot. The fix is always to add a high-cardinality column to the partition key. (~5 minutes)
Run this CREATE TABLE live. After creating it, ask: "What if I queried across multiple days — say a week?" Walk through: you'd query 7 partitions (one WHERE clause per day), or use a token range. Neither is as clean as a single-partition query, but it's a predictable access pattern. Bounded partition size is more important than eliminating multi-partition queries. (~6 minutes)
Run this CREATE TABLE live. After creating it, ask: "What if I queried across multiple days — say a week?" Walk through: you'd query 7 partitions (one WHERE clause per day), or use a token range. Neither is as clean as a single-partition query, but it's a predictable access pattern. Bounded partition size is more important than eliminating multi-partition queries. (~6 minutes)
Run this live and show the result. Then deliberately remove the `reading_date` predicate and show the ALLOW FILTERING error — this is the same error from the driving question. Students should now understand *why* it fails: without the full partition key, Cassandra would have to ask every node. (~5 minutes)
The application-side batch write is important: Cassandra has `BATCH` statements, but they are NOT transactions — they don't guarantee atomicity across tables the way a SQL transaction does. A failed batch may partially write. Application code must handle this (idempotent inserts with `IF NOT EXISTS` or just accept eventual consistency). This is the "you bought write throughput; here's what it costs" moment. (~5 minutes)
The application-side batch write is important: Cassandra has `BATCH` statements, but they are NOT transactions — they don't guarantee atomicity across tables the way a SQL transaction does. A failed batch may partially write. Application code must handle this (idempotent inserts with `IF NOT EXISTS` or just accept eventual consistency). This is the "you bought write throughput; here's what it costs" moment. (~5 minutes)
LWT is a trap: students see "IF NOT EXISTS" and think "great, I'll use this everywhere for safety." Drive home the 4 Paxos round trips — this can reduce write throughput by 10–20× on busy tables. Rule of thumb: LWT for registration/deduplication only. For everything else, design for idempotency instead. TTL demo: insert with TTL 10, then SELECT after 11 seconds to show the row is gone — very effective visually. (~6 minutes)
LWT is a trap: students see "IF NOT EXISTS" and think "great, I'll use this everywhere for safety." Drive home the 4 Paxos round trips — this can reduce write throughput by 10–20× on busy tables. Rule of thumb: LWT for registration/deduplication only. For everything else, design for idempotency instead. TTL demo: insert with TTL 10, then SELECT after 11 seconds to show the row is gone — very effective visually. (~6 minutes)
Collections look convenient but have a pitfall: Cassandra reads the entire collection to update one element. A `LIST` with 10,000 entries that you append to on every event is a performance problem. Rule of thumb: collections work well for small, bounded sets (< 100 elements). For large sets, design a separate child table. (~3 minutes)
This is a common mistake. Secondary indexes look like SQL indexes but behave completely differently — they are distributed, one per node, and a query against a secondary index fans out to every node simultaneously. For a 100-node cluster, that's 100 parallel partition scans. Materialized views (not covered here) are a better production alternative, but still have trade-offs. The safe rule: new table. (~3 minutes)
Give students 30 seconds to scan the table silently before explaining. The most important row for exams: "Read richness." Redis = only key lookup; MongoDB = any field; Cassandra = must know partition key. That single constraint is why query-first design exists. (~4 minutes)
Give students 30 seconds to scan the table silently before explaining. The most important row for exams: "Read richness." Redis = only key lookup; MongoDB = any field; Cassandra = must know partition key. That single constraint is why query-first design exists. (~4 minutes)
Give students 30 seconds to scan the table silently before explaining. The most important row for exams: "Read richness." Redis = only key lookup; MongoDB = any field; Cassandra = must know partition key. That single constraint is why query-first design exists. (~4 minutes)
Frame these as interview/design review talking points, not just course content. A systems design interviewer will give a scenario and ask "which NoSQL?" Students who can articulate Redis = speed, Mongo = flexibility, Cassandra = write scale stand out immediately. Spend time on the boundary cases: "what if I need both fast reads AND rich queries?" (probably MongoDB + Redis caching layer). (~5 minutes)
Frame these as interview/design review talking points, not just course content. A systems design interviewer will give a scenario and ask "which NoSQL?" Students who can articulate Redis = speed, Mongo = flexibility, Cassandra = write scale stand out immediately. Spend time on the boundary cases: "what if I need both fast reads AND rich queries?" (probably MongoDB + Redis caching layer). (~5 minutes)
These scenarios are deliberately ambiguous at the edges — Scenario A has a Redis-vs-Cassandra debate built in; Scenario C has a Redis-vs-ScyllaDB debate. The goal is not a single right answer but justified reasoning. Circulate while groups work and listen for misconceptions: "We'd use Redis for C because it's fastest" → probe: "2M writes/sec × 90 days of data — how much RAM is that?" (~5 minutes active + 3 minutes debrief)
Walk through the boundary cases — production systems often use two databases together (Redis for real-time aggregation, Cassandra for durable storage). That layered architecture is the "real" answer for Scenario C at companies like Twitter or Discord. If time allows, ask: "what would the Cassandra schema for Scenario C look like?" — students should propose PRIMARY KEY ((campaign_id, date), ts, click_id). (~5 minutes)
Walk through the boundary cases — production systems often use two databases together (Redis for real-time aggregation, Cassandra for durable storage). That layered architecture is the "real" answer for Scenario C at companies like Twitter or Discord. If time allows, ask: "what would the Cassandra schema for Scenario C look like?" — students should propose PRIMARY KEY ((campaign_id, date), ts, click_id). (~5 minutes)
Pause here and ask the room: "If your team uses Cassandra today and GC pauses are causing SLA violations, what's the migration path to ScyllaDB?" Answer: backup data, restore to ScyllaDB cluster, point application at new cluster. No schema changes, no code changes. That's the value of CQL compatibility. (~2 minutes)
The exam question will give a scenario and ask students to justify a database choice. Reinforce: the answer is always "it depends on the access pattern." A student who says "I'd pick Cassandra because it's fast" without mentioning partition key design or query pattern is not demonstrating the week's learning objective. (~2 minutes)
The first three gaps all point toward Kafka + Spark Structured Streaming, which is Weeks 12–13. The fourth gap (cross-store queries) is worth naming so students know it's a solved problem in industry — mention Trino as the canonical answer — but we won't cover it in the course. (~3 minutes)
Remind students that Assignment 3 is due end of Week 12 — they should start the Cassandra schema design this week, since the modeling skills are already in hand. The Kafka mechanics are taught next week. (~2 minutes)
End by connecting back to the driving question: "We started with ALLOW FILTERING breaking your product query. Query-first design fixed it. But what if the business needs to react the moment an event is written — not query it later?" That's Kafka. Strong closing loop that motivates the entire streaming module. (~2 minutes)