Cassandra — Built for When It Must Not Go Down

	Redis	MongoDB
Model	Key-value + structures	JSON documents
CAP	CP (Redis Cluster)	CP (replica set primary)
When	Sub-ms latency, sessions, pub/sub	Flexible schema, rich queries

Strategy	When to use
`SimpleStrategy`	Single datacenter (dev/test)
`NetworkTopologyStrategy`	Multi-DC production (specify RF per DC)

Level	Replicas must respond	Notes
`ONE`	1	Fastest; stale reads possible
`TWO`	2	Rarely used directly
`QUORUM`	RF/2 + 1 majority	Common production choice
`LOCAL_QUORUM`	Majority within local DC	Multi-DC best practice
`ALL`	All RF nodes	Strongest; cluster degraded on any failure

RF	Write CL	W nodes	Read CL	R nodes	R+W	Strong?
3	QUORUM	2	QUORUM	2	4	yes
3	ONE	1	ONE	1	2	no
3	ALL	3	ONE	1	4	yes (but ALL writes block on any node failure)
5	QUORUM	3	QUORUM	3	6	yes

Scenario	Why HBase	Why Not Cassandra
Random reads into a 10 TB HDFS dataset	Reads from HDFS directly; no ETL	Cassandra stores its own data; can't query HDFS
Sparse rows (most columns empty per row)	Column qualifiers are dynamic; only non-null values stored	Cassandra tables have fixed columns
MapReduce / Spark jobs that also need point lookups	Native Hadoop integration	Requires separate connector
Strong consistency required (no "last-write-wins")	CP: master ensures one truth	AP by default; reconciles after the fact

Feature	SQL (Postgres)	CQL (Cassandra)
Joins	Yes	None
Subqueries	Yes	None
Arbitrary WHERE	Yes	Partition key required
Aggregation across partitions	Yes	Very limited
Schema evolution	ALTER TABLE	ALTER TABLE (limited)
Updates	In-place row update	Upsert (INSERT = UPDATE)

Column type	Purpose	Queryable?
Partition key `(customer_id)`	Routes to node(s)	Required in every WHERE
Clustering columns `order_date, order_id`	Sort order within partition	Range queries allowed
Regular columns `total, status`	Data stored in row	Only with ALLOW FILTERING

Option	Query 1	Query 2	Notes
A `(post_id)`	Fails — no user_id	Fails	Optimizes for "look up one post by ID" only
B `((user_id), created_at)`	Works	Works	But `created_at` alone is not unique — two posts at same second collide
C `((user_id, created_date), created_at)`	Fails — must supply date for Query 1	Works	Bounds partition size (good!) but breaks Query 1
D `((user_id), created_at, post_id)`	Works	Works	Best choice — unique, ordered, partition bounded by user

Company	Known for
Netflix	Replaced a single Oracle DB with Cassandra for streaming metadata
Apple	Runs one of the largest Cassandra deployments in the world (Siri, iCloud)
Discord	Stores billions of messages; migrated away from Cassandra to ScyllaDB
Uber	Uses Cassandra for geospatial and trip data at global scale
Instagram	Stores user feed and activity data at hundreds of millions of users

Gap	Solution	When
Rules for choosing partition keys	Query-first data modeling	Week 11, Session 2
JVM GC pauses causing tail latency spikes	ScyllaDB — C++ rewrite, shared-nothing	Week 11, Session 2
Continuous data ingestion at millions of events/sec	Apache Kafka — distributed event streaming	Week 12

Speaker context: Students just finished MongoDB (Week 10 Session 2) and understand CP document stores. Today flips the CAP triangle: Cassandra defaults to AP — it keeps accepting writes during a partition, at the cost of possible momentary inconsistency. Lead with the "single master = single point of failure" problem before showing the ring. Demo-first works well: show the ALLOW FILTERING error early so the schema design motivation is visceral.

Quick recap slide — 2 minutes. The key contrast to plant in students' minds: both Redis and MongoDB chose consistency (CP); today flips to availability (AP). Ask: "What did MongoDB do when the primary failed?" Students should recall the election window. That 10–30 second window is the exact problem Cassandra's design eliminates.

Ask students: "If there's no primary, how does a client know which node to talk to?" They'll guess wrong — let them. The answer is: any node. The client connects to any node in the cluster; that node becomes the coordinator for that request and routes it to the correct replicas. This is the leaderless architecture insight.

Board demo: Draw a clock face. Place Node A at 12, B at 3, C at 6, D at 9. Scatter 8 "rows" around the clock — each goes to the nearest clockwise node. Add a node at 1:30 — only rows between 12 and 1:30 move. Then contrast: if you had 4 buckets (% 4) and add a 5th (% 5), almost every row changes bucket.

Emphasize that RF is set at the keyspace level, not per-table. RF=1 (class default) means no redundancy — fine for learning, never for production. RF=3 is the standard production choice because it survives one node failure while QUORUM reads/writes still function.

NetworkTopologyStrategy lets you set different RFs per datacenter: e.g., RF=3 in US-East, RF=2 in EU-West. This is the only correct choice in production — SimpleStrategy doesn't understand rack/DC topology, so it may place all three replicas on nodes in the same rack, defeating the purpose of RF=3.

Students often confuse gossip with a health-check service (like a load balancer ping). The key difference: gossip is decentralized — there is no "health check server" to fail. Each node independently builds a picture of cluster state. The O(log N) convergence means a 100-node cluster reaches consistent cluster state in roughly 7 rounds of gossip (~7 seconds). Mention `nodetool gossipinfo` for debugging.

Key point: the commit log ensures durability (survives node restart). The memtable enables fast writes. SSTables are immutable — no in-place updates. This is why Cassandra's write throughput is so high: sequential disk writes only.

Common misconception: students expect reads to be fast like a hash table lookup. Explain that because SSTables are immutable and writes never update in place, a single logical row may have pieces scattered across several SSTables. The read path merges them using timestamps (last-write-wins). Bloom filters eliminate most disk reads for missing keys. Compaction periodically merges SSTables to reduce read amplification — this is a background cost, not a request-path cost.

Spend a moment on why per-query configurability matters: a clickstream write (IoT sensor event) is fine at ONE — losing one event is acceptable and speed matters. An order total read for a payment confirmation might need QUORUM. Same cluster, same table, different trade-off. This tunability is what distinguishes Cassandra from databases where consistency is a cluster-wide setting.

Walk through the formula: R + W > RF means at least one node that confirmed the write must participate in any subsequent read. With RF=3, QUORUM writes go to 2 nodes and QUORUM reads check 2 nodes — the overlap guarantees one node saw both. The ALL write row is a useful trap: ALL write + ONE read is technically strongly consistent, but ALL writes block on any single node failure, which usually defeats the purpose of Cassandra.

The Write=ONE + Read=QUORUM pattern is popular in write-heavy workloads (metrics, logs) where you want fast ingest but need reliable reads. LOCAL_QUORUM is the multi-DC variant — it only requires a majority within the local datacenter, avoiding cross-DC latency on every write. This is what most production Cassandra deployments use.

This is a good place to address the "but my bank uses Cassandra" question — it often does, but for session state and transaction logs, not the authoritative balance. The canonical balance lives in a CP store (often a relational DB with serializable transactions). Cassandra's role is absorbing high-volume event writes cheaply. Push students to articulate the failure mode they're optimizing for before choosing a consistency level.

This is the "so what do I actually do?" slide students are waiting for. Reinforce the pattern: Cassandra is a write-optimized, availability-first store for high-volume workloads with known access patterns. The most common mistake is reaching for Cassandra because it's "scalable" when a Postgres instance would handle the load for years. Cassandra's operational cost (schema migrations, hotspot detection, compaction tuning) is significant — it should earn its place in the architecture.

Students need this context before the comparison slide. The key insight: HBase didn't replace HDFS — it sits *on top* of it. Your MapReduce or Spark jobs can read HDFS files directly; HBase adds the ability to do row-key lookups into that same data without a full scan. The column family concept is the source of most confusion — stress that column families are defined at schema creation time (like table columns), but the column qualifiers within a family are dynamic (any row can have any qualifier). A cell is identified by: row key + column family + column qualifier + timestamp.

Students often ask why HBase exists if Cassandra is strictly better. It isn't. HBase's superpower is sitting inside the Hadoop ecosystem — if your ETL pipeline produces HDFS files and you need point lookups into that data, HBase is far simpler than loading everything into Cassandra. HBase also handles extremely sparse data efficiently (column qualifiers are dynamic, not fixed schema). Keep this to 3–5 minutes; Assignment 3 focuses on Cassandra, not HBase.

Keep this to 5 minutes. The point is conceptual: same data model, opposite CAP position, different ecosystem home. Students don't implement HBase in this course — the assignment removed it in favor of depth in Cassandra/MongoDB.

The "CQL looks like SQL" surface similarity is a trap. Students who treat it like SQL will immediately hit errors (no joins, can't filter on non-key columns). Spend a moment on each row: "Joins — why not?" (data is distributed across nodes; a join would require shipping data between nodes, killing the performance advantage). "Arbitrary WHERE — why not?" (without a partition key, Cassandra doesn't know which nodes hold matching rows). These aren't limitations from laziness — they're intentional constraints that enable horizontal scale.

The double parentheses in `PRIMARY KEY ((customer_id), order_date, order_id)` confuse students. Outer parens = PRIMARY KEY clause. Inner parens = composite partition key boundary. Single-column partition key: `PRIMARY KEY (customer_id, order_date)` — here customer_id is partition key, order_date is clustering. Composite partition: `PRIMARY KEY ((customer_id, store_id), order_date)` — both columns together form the partition key. Draw this on the board if students look confused.

Demo tip: RF=1 for class to avoid needing multiple containers. Production would use RF=3 minimum. If nodetool status shows "UN" (Up/Normal), the node is healthy.

Have students run these commands themselves. `uuid()` generates a random UUID client-side — point out that unlike a SQL SERIAL primary key, there is no auto-increment in Cassandra. UUIDs are the standard PK choice because they distribute evenly across the ring. Add 3–4 more rows with different customer_ids before the query demo so range queries return visible results.

Let students hit the error. The error message is a feature: Cassandra refuses to let you accidentally do something O(total rows). ALLOW FILTERING exists for development/debugging only.

The error message Cassandra throws is deliberately informative: "Cannot execute this query as it might involve data filtering and thus may have unpredictable performance." It's telling you exactly what ALLOW FILTERING does. Treating the error as a bug to suppress (by adding ALLOW FILTERING) is the mistake. Treating it as a design signal is the right response: "this query pattern needs a dedicated table."

The application-level dual-write pattern is the standard solution: when an order is placed, write to both `orders_by_customer` and `orders_by_status` in the same request. This is denormalization by design. Preview that Cassandra Lightweight Transactions (LWT) exist for atomic operations, but they're slow (Paxos under the hood) and should be rare. The common pattern is application-managed consistency through dual writes.

8 minutes

Give pairs 8 minutes. Circulate and listen for the Option B collision discussion — many students won't notice that two posts at the same second collapse into one row. Also watch for confusion between the composite partition key in Option C and a single partition key. The goal is to surface the intuition that clustering columns provide both ordering and uniqueness guarantees.

Let pairs discuss before revealing the debrief slide. If most groups converge on D, great — ask them to articulate *why* C fails for Query 1. If groups are split between B and D, the collision scenario is the teaching moment: INSERT two posts for the same user at the same second with Option B and show that the second insert silently overwrites the first (Cassandra INSERTs are upserts).

Close the debrief by connecting back to the session theme: "We designed this table backwards from the query — we started with what the application needs to read, then built the schema." This is query-first design. Session 2 formalizes this into a full modeling workflow. Option C is worth dwelling on: the composite partition key `(user_id, created_date)` is actually a valid pattern for bounding partition size on high-volume users (e.g., a celebrity with millions of posts) — it just sacrifices Query 1 without also supplying the date.

10 minutes

10 minutes. Students should search "[company] Cassandra engineering blog" — most have published detailed posts. Discord's migration to ScyllaDB is especially rich: their blog post "How Discord Stores Billions of Messages" explains the hotspot problem directly. Netflix and Apple have DataStax Summit talks on YouTube. If students finish early, push them to find the actual partition key design the company uses.

Keep debrief tight — 5 minutes max. The goal is pattern recognition, not deep dives. If Discord came up, note that ScyllaDB is a drop-in Cassandra replacement (same CQL, same drivers) — that's exactly what Session 2 covers.

Assignment 3 is due at the end of Week 12. Students will design Cassandra schemas as part of it — this session is the foundation.

One-minute close. Kafka (Week 12) is the natural next step: Cassandra absorbs individual writes well, but when producers generate millions of events per second from many sources simultaneously, you need a buffer layer between producers and the database. That's Kafka's job — Cassandra is often the sink at the end of a Kafka pipeline.

Cassandra — Built for When It Must Not Go Down

CS 6500 — Week 11, Session 1

The Driving Question

The Cassandra Answer

Week 10 Recap

Ring Architecture

The Availability Challenge

Consistent Hashing

Replication Strategy

Replication: Strategy Selection

Gossip Protocol

Write Path

Read Path

Tunable Consistency

Consistency Levels

Strong Consistency Formula

Consistency: Production Pattern

Cassandra Is AP — but Tunable

When to Use Cassandra

HBase

What Is HBase?

What Is HBase?

HBase vs. Cassandra

When HBase Wins

CQL — The SQL That Isn't

CQL vs SQL

Table Anatomy

Demo: Connect

Demo: Create Table/Insert

Demo: Query Patterns

ALLOW FILTERING

ALLOW FILTERING: The Fix

Activity

Activity: Design the Right PRIMARY KEY

Activity: Evaluate the Options

Activity Debrief

Activity 2

Activity: Real-World Cassandra

Activity: Case Study Questions

Case Study Debrief

Session 1 Key Takeaways

What's Missing?

The Gaps

What Comes Next