Figure I · Tokens, walks, replicas, quorums

The Cassandra ring, by the walk.

Cassandra is taught as a ring. It’s clearer as a line that wraps. Below, the ring unrolls — and from there we follow a single key all the way through: where it lands, where its replicas live, what happens when machines die, and why R + W > RF is not the same as strong consistency.

Why this matters

Where you've already used Cassandra without knowing it

Cassandra was built at Facebook for Inbox search and open-sourced in 2008. Today its largest production installations are some of the busiest databases on the planet:

Discord — 4 trillion messages, sharded by channel ID across the ring
Netflix — every play, pause, and scrub goes into a Cassandra column
Apple iCloud — billions of devices syncing to one of the biggest known clusters
Instagram — feed, direct messages, fanout

The pattern: very high write volume from many clients, geographic distribution out of the box, eventual consistency is acceptable, and you would rather stay up than be perfectly correct during a partition.

§ I.1The ring is just a line that wraps

Cassandra’s token space is a finite range of integers. Every diagram draws it as a circle so that the largest token wraps back to the smallest. The circle does one job: it makes the wrap-around visible. Everything else is easier on a line.

Press Unwrap. The ring rolls flat into a horizontal number line, the four nodes slide into place, and from here on out we’ll think of the token space as a line — with the understanding that it wraps at the right end.

Figure I.1 From ring to row

Show me

Read this: the four dots are the same four tokens whether they sit on a circle or a line. The wrap arrow at the right end is what we have to remember once we’re on the line.

§ I.2From a row to a token to a node

A row’s partition key is hashed into one number — its token. To find the row’s owner, start at that token and walk right along the line until you hit a node’s token. If you fall off the right end, you wrap to the left and keep walking.

Type a key below and press Hash → walk. Try user:42, then user:42 again — same key always lands at the same token, with the same owner. Try zzz to land near the right end and watch the wrap.

Figure I.2 A key, hashed and walked

Row key

Action

Read this: the dark dot is where the key landed; the arrow is the walk. If the walk has to fall off the right end and reappear on the left, you’ll see it loop below the line.

§ I.3Vnodes, and what happens when a node joins

With one token per node, adding a new node would mean handing over half of one neighbor’s key range — a lot of data in one place, and the load is wildly uneven. Cassandra fixes this with vnodes: each physical node owns many tokens scattered across the line. Adding a new node sprinkles its vnodes into the gaps, and only the tiny segments directly to the right of those new vnodes change owner.

Press Add a node below to drop a fifth node’s vnodes into the line. Notice that most of the line keeps the color it already had.

Figure I.3 A node joins the cluster

Action

Read this: every vertical tick is a vnode, colored by the physical node it belongs to. The shaded band underneath shows which node owns each segment. Adding a new node only changes the bands directly to the right of the new ticks.

§ I.4Replication: just keep walking

A replication factor of RF means a row is stored on RF distinct physical nodes. The first owner is found by the walk from § I.2. The next RF − 1 replicas are found by continuing the walk and picking each new physical node we encounter. Vnodes that belong to a node we’ve already counted are skipped.

Figure I.4 A key, replicated

Row key

RF 3

Action

Read this: the walk picks RF distinct physical nodes (haloed). Vnodes whose color we’ve already collected are crossed out — they don’t count, and the walk continues past them.

§ I.5Failure: which rows go missing

With RF = 1, every node is the only home for some range of keys. Kill a node and that whole range is gone. With RF = 3, a row is stored on three distinct nodes, so one failure loses nothing — you’d have to lose every replica of a row for that row to be unreachable.

Tick the boxes below to kill nodes. Slide the RF slider and watch the orange shading shrink: replication is what makes data survive failures, and the walk’s skip-rule is what makes replication actually use distinct machines.

Figure I.5 Failure domains: who has every replica?

RF 1

Kill nodes

Keys lost 0%

Read this: tick a box to kill a node; the orange shading marks every key whose entire replica set is now dead. Bump RF up and the same kills affect less of the line.

§ I.6Quorums, and what they don’t promise

If a write reaches W replicas and a read reaches R replicas, and R + W > RF, then the read’s replica set must overlap the write’s replica set on at least one node. The read can see the write, and Cassandra’s last-write-wins picks the newer version. That’s the read-your-write guarantee.

It is not the same as strong consistency. If the cluster was partitioned during the writes, replicas can have legitimately diverged values — different writes on different replicas — and the read still finds them and silently picks one. Press Run conflict scenario to walk through the canonical case (RF=2, W=1, R=2) and watch it happen.

Figure I.6 A row, a quorum, a partition

RF 2 (fixed)

W 1

R 2

Replicas alive

N1 N2

Manual

Auto

Read this: each replica is a small box holding the current value of one row’s two columns. Writes try to reach W live replicas; reads try to reach R. The conflict scenario plays out the textbook case: partitioned writes leave replicas that disagree, and the merge is the only thing that picks a winner.

§ I.7The rules to remember

token space     : an integer range, drawn as a ring (it's a line that wraps)
walk            : right (clockwise), wrapping at the right end
owner           : first node-token ≥ key's token
vnodes          : many tokens per physical node, for even balance
replicas        : keep walking, pick the next RF distinct nodes
QUORUM(RF)      : floor(RF / 2) + 1
R + W > RF      : read-your-write guarantee (NOT strong consistency)
under partition : replicas may diverge; LWW silently picks one

← back to the atlas · next figure: a Kafka topic, partitioned →