Database Fundamentals
Almost every outage you will ever debug traces back to the database. A query that ran in 5 milliseconds on your laptop with a thousand rows takes 40 seconds in production with 50 million, and now your checkout page is timing out and customers are leaving. The difference is rarely the application code. It is whether the right index exists, whether the table was partitioned, whether reads were sent to a replica, and whether the schema was designed for how the data is actually queried. These are not advanced topics you grow into later. They are the foundation, and ignoring them is the single most common reason small systems fall over as they grow.
This category covers how relational databases actually store, find, and protect your data, and how you keep them fast as the row count climbs. You will learn what an index really is and why B-trees power almost all of them, when to normalize versus denormalize, how to read a query plan and fix the slow part, and how the big scaling moves work: read replicas, replication topologies, partitioning, and sharding. Every lesson is built to be understood by a beginner and still respected by an engineer who has run databases in production.
What database fundamentals actually cover
A database is two jobs glued together: store data durably, and find it again quickly. Most of the topics here exist to make the second job fast without breaking the first. When you write a row, the database has to put it somewhere on disk, keep enough structure around it to find it later, and make sure a crash mid-write does not corrupt anything. When you read, it has to locate the rows you asked for among millions of others without scanning all of them.
The lessons start with the building blocks you touch every day. Database Indexing and B-Trees explain how a query finds 3 rows out of 50 million in milliseconds instead of reading the whole table. Database Normalization and Database Denormalization cover how you shape tables: splitting data to avoid duplication and update bugs, or deliberately duplicating it to make reads faster. Query Optimization teaches you to read the plan the database produces and understand why it chose a scan over an index.
From there the category moves into the parts of a database that hold logic and reduce repeated work: Database Views, Materialized Views, Database Functions, Stored Procedures, and Database Triggers. A view is a saved query you can treat like a table. A materialized view stores the result so you do not recompute it on every read. Triggers and stored procedures push logic into the database itself, which is powerful and also a common source of hidden behavior, so the lessons cover both the use and the cost.
Keeping a database healthy: deletes, versions, and maintenance
Deleting data is not as simple as it sounds, which is why this category spends real time on it. Hard Delete removes the row for good. Soft Delete keeps it and flags it as deleted, which is what you usually want when you need an audit trail, undo, or to avoid breaking foreign keys. Tombstoning is the distributed-systems cousin: a marker that says this record is gone, used so that replicas and log-structured stores agree on the deletion. Choosing wrong here leads to either lost data you needed or tables full of garbage you can never clean up.
Keeping history is its own discipline. Versioned Data and Temporal Tables let you answer questions like what did this record look like last Tuesday, which matters for billing disputes, compliance, and debugging. These patterns are how systems give you a clean undo and a defensible record of change without bolting on a separate audit system later.
Databases also need ongoing maintenance, and ignoring it is how a fast database slowly becomes a slow one. Vacuum reclaims space from rows that were updated or deleted but still sit on disk. Compaction merges and cleans up storage in log-structured engines. Index Rebuilding restores an index that has become bloated or fragmented so lookups stay fast. None of this is glamorous, but a database that is never vacuumed or compacted will eventually grind to a halt with no code change to blame.
Scaling reads and writes: replication, partitioning, sharding
One database server can only do so much. When read traffic outgrows a single machine, Read Replicas are the first move: copies that serve queries while writes still go to the primary. Database Replication explains the general mechanism, and Master-Slave Replication versus Master-Master Replication covers the two main shapes. Master-slave (one writer, many readers) is simpler and the default for most systems. Master-master lets multiple nodes accept writes, which buys availability but forces you to handle write conflicts, so reach for it only when you genuinely need it.
When the data itself is too big for one machine, you split it. Table Partitioning breaks one logical table into smaller physical pieces. Vertical Partitioning splits by columns, putting rarely-used or large columns in a separate store. Horizontal Partitioning splits by rows, and the lessons cover the strategies for choosing which rows go where: Range Partitioning (by value ranges, like dates), Hash Partitioning (by a hash for even spread), List Partitioning (by explicit categories, like region), and Composite Partitioning (combining strategies). The choice determines whether your queries hit one partition or all of them.
Database Sharding takes partitioning across separate servers and is the heaviest scaling tool here. It can take you to billions of rows, but it complicates everything: cross-shard queries, joins, transactions, and rebalancing all get harder. The lessons are honest that sharding is a last resort after indexing, replicas, and partitioning have been exhausted. Connection management matters too, which is why Database Connection Pooling is covered: opening a fresh connection per request is one of the quietest ways to overload an otherwise healthy database.
How real systems use these
Every large product is built on these patterns stacked together. A typical high-traffic application sends reads to a pool of read replicas and writes to a primary, with connection pooling in front so a traffic spike does not exhaust the database's connection limit. Hot tables are partitioned by date or by customer so old data can be archived and queries only touch the relevant slice. Indexes are tuned against the actual query plans rather than guessed at.
When a single primary can no longer hold the write volume, companies shard. Platforms running at the scale of social feeds, payments, or messaging shard by user id or tenant so each shard owns a manageable slice, then rely on careful index design and materialized views to keep per-shard queries fast. Soft deletes and temporal or versioned tables back the audit and compliance requirements that come with handling money or personal data.
The pattern to notice is order. Teams that scale well reach for the cheap, low-risk tools first: add an index, read a query plan, add a replica, partition a table. They turn to sharding and multi-master only when they have to, because those add operational cost that never goes away. Learning these fundamentals in that order is exactly how you avoid the classic mistake of jumping to a complex distributed setup to solve a problem that one well-placed index would have fixed.