MongoDB — Documents, Queries, and Aggregation

Property	Meaning
Consistency (C)	Every read sees the most recent write
Availability (A)	Every request gets a (non-error) response
Partition Tolerance (P)	System continues despite network failures

RDBMS concept	MongoDB equivalent
Database	Database
Table	Collection
Row	Document
Column	Field
Primary key	`_id` field
Foreign key	Reference (manual) or embedded sub-document
JOIN	`$lookup` pipeline stage
Index	Index
Schema	No enforced schema (optional validation)

Scenario	Recommendation
Blog post with author name	Embed author name (denormalize)
Blog post with all comments	Reference — comments can grow unbounded
Order with line items (1–50)	Embed — always loaded together
User with all orders (lifetime)	Reference — could be thousands
Product with category (shared)	Reference — shared across many products
Invoice with billing address	Embed — must not change if address changes later

Stage	Purpose	SQL Equivalent
`$match`	Filter documents	`WHERE`
`$group`	Aggregate by key	`GROUP BY`
`$sort`	Order results	`ORDER BY`
`$limit`	Cap result count	`LIMIT`
`$project`	Select / rename fields	`SELECT`
`$lookup`	Join another collection	`JOIN`
`$unwind`	Flatten array field	(unnest / lateral join)
`$addFields`	Compute new fields	(computed column)
`$count`	Count matching docs	`COUNT(*)`

Scenario	Redis	MongoDB	PostgreSQL
Session storage	First choice	Overkill	Too slow
Product catalog	No rich queries	Flexible schema	If structured
Real-time leaderboard	Sorted sets	Too slow	Too slow
Complex joins	Not designed for	`$lookup` (limited)	First choice
Write-heavy IoT stream	Memory limits	Possible	Bottleneck
Financial records	No ACID	Limited ACID	First choice
Social graph	No traversal	Reference chains	Recursive CTEs

Speaker context: Students just finished hands-on Redis. Now we shift to MongoDB — flexible schema, rich queries, disk persistence. The key tensions to surface: (1) schema design matters MORE in MongoDB than SQL, not less; (2) embedding vs. referencing is the central decision; (3) CAP theorem becomes concrete here. The $lookup lab task often runs over — have a pre-run version ready to demo. The schema design challenge (Task 4) has no single right answer; grade on justification.

Run this live. Walk through each stage, pausing to explain what the intermediate result looks like. Ask: "What would happen if I moved $match after $group?"

Task 4 has no single right answer. Grade on justification quality. Key discussion point: comments can grow unbounded → reference. Tags can be embedded if count is small (< 20). Author name is often embedded (denormalized) on posts for fast display.

MongoDB — Documents, Queries, and Aggregation

CS 6500 — Week 10, Session 2

Session 1 Recap

CAP Theorem

CAP — Concrete Example

Documents as the Unit of Storage

Collections vs. Tables

Schema Design: Embed vs. Reference

Schema Design: Embed vs. Reference (cont.)

Embed vs. Reference — Decision Guide

MongoDB CRUD Operations

Query Operators

The Aggregation Pipeline

Key Pipeline Stages

Demo: Aggregation on the E-Commerce Dataset

Demo: $lookup — Joining Collections

Hands-On: MongoDB Analytics Lab (20 min)

Activity Debrief: Blog Schema Design

When to Use What

MongoDB in Production: Key Considerations

Common Misconceptions — Addressed

Week 10 Summary

Homework (Due Before Week 11)

Preview: Week 11 — Cassandra/ScyllaDB