Consistency Models
Two people split a bill. One taps "pay," sees the balance drop, then refreshes and sees the old balance again. Money looks like it moved and then un-moved. Nothing was actually lost, but the system showed two truths a second apart, and now a customer is filing a support ticket. That gap between "what one node thinks is true" and "what another node shows" is the entire subject of consistency models. The moment your data lives on more than one machine, you have to decide exactly which version of reality a reader is allowed to see.
A consistency model is the contract your storage layer makes about the order and visibility of reads and writes. Pick a strong model and every reader sees a single, agreed-upon history, but you pay in latency and availability. Pick a weak model and you get speed and the ability to keep serving during a network partition, but you accept that two readers can briefly disagree. This category walks the full ladder, from the strongest guarantee (linearizability) down through sequential, causal, and the session guarantees, then into the database-transaction side (snapshot isolation, serializability) and the conflict-free approaches (CRDTs, operational transformation) that let collaborative apps merge edits without a coordinator. It also covers the durability machinery that makes any of these guarantees survivable: write-ahead logging, checkpointing, snapshotting, and point-in-time recovery.
What a Consistency Model Actually Promises
A consistency model is a precise statement about what a reader is allowed to observe given the writes that have happened. It is not about whether data is correct. It is about ordering and visibility. The strongest model, linearizability, says every operation appears to take effect at a single instant between its start and finish, so once any reader sees a new value, no reader can ever see the old one again. The system behaves as if there were one copy of the data and one global clock, even though there are many copies spread across machines.
Sequential consistency relaxes the global-clock part. All operations still appear in some single total order that every process agrees on, and each process's own operations keep their program order, but that order does not have to match real wall-clock time. Causal consistency relaxes further: it only guarantees that operations which are causally related (a reply must come after the message it answers) are seen in the right order everywhere. Unrelated operations can be seen in different orders by different readers, which is usually fine and much cheaper to provide.
Below those sit the session guarantees, which scope the promise to a single user's session rather than the whole system. Read-your-writes means you always see your own latest update. Monotonic reads means time never goes backward for you, so you will not see a value disappear after you have already seen it. Monotonic writes means your own writes are applied in the order you issued them. These are the guarantees users actually notice, and they are often the practical sweet spot.
Consistency in Databases: Isolation and Serializability
Distributed-systems consistency and database isolation describe the same problem from two angles. Where consistency models talk about replicas, transaction isolation talks about concurrent transactions touching shared rows. Serializability is the gold standard: the result of running transactions concurrently is identical to some order in which they ran one at a time. No lost updates, no anomalies, no surprises. It is the transactional cousin of linearizability and it is the easiest model to reason about, which is why it is worth the cost when money or inventory is involved.
Snapshot isolation is the model most production databases actually default to, and it is the one most engineers run without realizing it. Each transaction reads from a consistent snapshot taken at its start, so reads never block writes and writes never block reads. It eliminates most anomalies but allows write skew, where two transactions each read a shared state, each decide their write is safe, and together violate an invariant that neither broke alone. Knowing that snapshot isolation is not serializable, and knowing exactly which anomaly slips through, is the difference between a system that quietly corrupts data under load and one that does not.
Underneath all of these guarantees is durability, and durability is its own set of lessons here. Write-ahead logging records the intent to change data before the data itself is touched, so a crash mid-write can be replayed or rolled back cleanly. Checkpointing and snapshotting bound how much of that log has to be replayed after a restart. Point-in-time recovery uses the log plus a base snapshot to rebuild state as it existed at any chosen moment, which is what turns an accidental mass-delete into a recoverable incident rather than a resume-updating event.
Choosing a Model and Living With the Trade-Off
The honest framing is the CAP and PACELC tension: when the network splits, you choose between staying consistent and staying available, and even when the network is healthy you trade latency against consistency. Strong models like linearizability require coordination on every operation, which means cross-node round trips, higher tail latency, and an inability to serve writes when a quorum cannot be reached. Weak models skip the coordination, answer locally, and stay up through partitions, but hand you the job of resolving the disagreements they allow.
The right answer is per-feature, not per-system. A bank ledger or a seat-booking flow wants serializability or linearizability, because showing two truths there is a financial bug. A social feed, a like counter, or a presence indicator is perfectly happy with causal or eventual consistency, because a reader seeing a slightly stale count costs nothing and the latency savings are large. Most real products mix both: a strong core for the parts that move money or enforce uniqueness, and a fast eventually-consistent layer for everything else.
When you accept weak consistency, you need a strategy for the conflicts it produces, which is where the last cluster of lessons lives. Conflict resolution covers the policies: last-write-wins is simple but silently drops data, while smarter merges preserve intent. CRDTs (conflict-free replicated data types) are data structures whose merge operation is mathematically guaranteed to converge no matter the order updates arrive, which is how shopping carts and counters stay correct without a coordinator. Operational transformation solves the same merge problem for ordered text, transforming concurrent edits against each other so two people typing in the same paragraph end up with the same document.
How Real Systems Apply These Ideas
Google Spanner is the headline example of buying strong consistency at scale. It uses synchronized atomic clocks (the TrueTime API) to give external, linearizable transactions across data centers, which is why Google runs critical systems on a database that behaves like a single machine spanning continents. The price is that every commit waits out a small clock-uncertainty window. Amazon's DynamoDB takes the opposite stance by default, offering eventually consistent reads for speed and cost, with strongly consistent reads available as an explicit, more expensive option, so teams choose the guarantee per query.
Collaborative editors made these models visible to everyone. Google Docs is built on operational transformation so dozens of people can type at once and converge on the same document. Figma and many newer tools lean on CRDTs for the same reason, since CRDTs avoid the central transformation server that OT typically needs. Apple's Notes and offline-first mobile apps use CRDT-style merging so edits made on a plane reconcile cleanly when the device reconnects.
On the durability side, every serious database you have used runs the patterns in this category. PostgreSQL's write-ahead log is the source of truth for crash recovery and for streaming replication to standbys. Its base backups plus archived WAL are exactly what point-in-time recovery reads to rewind a database to the second before a bad migration. Redis offers both snapshotting (periodic point-in-time dumps) and an append-only log, letting operators trade durability against write throughput. These are not exotic features. They are the reason your data is still there after the process that wrote it crashed.