Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

5 US dollars for lifetime access globally, or 299 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of 5 dollars instead of annual subscriptions costing 100 to 200 dollars per year.

What is the difference between latency and throughput?

Latency is how long a single request takes from send to response, usually measured in milliseconds. Throughput is how many requests a system completes per second. They move independently. A system can answer one request quickly but choke under volume, or handle huge volume while each individual response is slow. Optimizing one does not automatically improve the other, which is why it matters to know which one your problem actually is.

Should I scale vertically or horizontally?

Vertical scaling (a bigger machine) is simpler and is often the right first step, but it has a hard ceiling and leaves you with a single point of failure. Horizontal scaling (more machines) has effectively no ceiling and improves resilience, but it only works cleanly if your services are stateless so any server can handle any request. Most production systems start vertical for simplicity and move horizontal once traffic, availability, or cost forces the change.

Why does idempotency matter so much in distributed systems?

Networks fail. A response gets lost, the client never hears back, and it retries the request. Without idempotency that retry runs the operation a second time, which is fine for a read but disastrous for something like a payment or an order. An idempotent operation produces the same result whether it runs once or five times, which is what makes safe retries possible. Since retries are unavoidable at scale, idempotency is what keeps them from causing damage.

When should I choose ACID over BASE, or SQL over NoSQL?

Choose ACID and SQL when correctness and consistency are non-negotiable and your data is relational, for example financial records, inventory, or anything where a half-completed transaction is unacceptable. Choose BASE and NoSQL when you need to scale horizontally across many machines, can tolerate data becoming consistent a moment later rather than instantly, and have access patterns that do not need complex joins. It is a trade-off between strong guarantees and the ability to scale wide while staying available.

What is the difference between stateless and stateful, and why does it affect scaling?

A stateless service keeps no memory of previous requests between calls, so every request carries everything it needs and any server can handle it. A stateful service stores information locally, which ties a given user to a specific server. Stateless designs scale horizontally with ease because you can add or remove servers freely and a load balancer can route anywhere. Stateful designs fight horizontal scaling, which is why session data is usually pushed out to a shared store like a cache or database.

Why do input validation and output encoding both matter?

They guard the two directions of the boundary. Input validation means never trusting data that comes from outside and checking it before it reaches your logic, which blocks malformed and malicious payloads. Output encoding means safely formatting data on the way out so it can never be interpreted as executable code, which is the main defense against injection attacks like cross-site scripting. Validate on the way in, encode on the way out, and you eliminate a whole class of common breaches.

foundation

Core Fundamentals

Every large system that has ever fallen over did so for a reason that traces back to something on this page. A checkout page that times out on Black Friday, an API that double-charges a customer because a retry fired twice, a database that locks up the moment traffic triples. None of those are exotic problems. They are failures to respect the basics: how fast a system answers, how much it can handle at once, what happens when you add a second server, and what a request is allowed to assume about the request before it.

Core fundamentals is the vocabulary and the mental model everything else is built on. Before you can reason about message queues, sharding, or consensus, you need to know the difference between latency and throughput, why a stateless service scales and a stateful one fights you, and when ACID guarantees are worth the cost versus when you trade them away for availability. The 26 lessons here are the foundation. Get them right and the advanced material reads like common sense. Skip them and you will keep relearning the same lessons in production, at 3 AM, with customers watching.

Core Fundamentals: the landscape

Performance: latency, throughput, and bandwidth are not the same thing

People use these three words interchangeably and then make bad decisions because of it. Latency is how long one request takes from send to response, measured in milliseconds. Throughput is how many requests you can complete per second. Bandwidth is the raw capacity of the pipe, how much data can move through it. A system can have low latency and low throughput, or high throughput and terrible latency. They move independently.

The classic trap is optimizing one and assuming the other followed. You speed up a single database query and feel good, but under load the server is queuing requests and the latency a real user sees has tripled. Or you add bandwidth expecting things to feel faster, but the bottleneck was processing time, not the network, so nothing changes. The slowest component dominates total latency. A 500ms database call makes your 1ms network irrelevant.

The lessons on synchronous and asynchronous processing connect directly here. A synchronous call makes the caller wait for the result, which is simple to reason about but ties up resources and stacks latency. Asynchronous processing lets the caller move on while the work happens in the background, which is how you keep throughput high when individual operations are slow. Knowing which one a given workload needs is one of the most common real design decisions you will make.

Scaling: vertical, horizontal, and the statelessness that makes it possible

When traffic grows, you have two moves. Vertical scaling means making one machine bigger, more CPU, more memory. It is the easy answer and it works until it does not, because there is a ceiling and a single machine is a single point of failure. Horizontal scaling means adding more machines and spreading the work across them. It has no real ceiling, but it forces you to answer a hard question: when a user's second request lands on a different server than their first, does anything break?

That question is why stateless versus stateful is on this list right next to scaling. A stateless service keeps no memory of past requests between calls, so any server can handle any request and you can add or remove servers freely. A stateful service remembers things locally, which means a specific user is tied to a specific server, and now horizontal scaling becomes a fight. Session management is the practical version of this problem. Where do you keep a logged-in user's session so that any server in the pool can serve them?

Scalability, elasticity, and load balancing complete the picture. Scalability is whether your system can grow at all. Elasticity is whether it can grow and shrink automatically as demand changes, which is what cloud autoscaling sells you. Load balancing is the traffic cop that spreads incoming requests across your pool so no single server gets buried. Caching sits alongside all of it as the cheapest performance win there is: store the answer once, serve it many times, and take the load off everything downstream.

Correctness: idempotency, timeouts, and the data guarantees behind them

Distributed systems fail in ways single programs do not. The network drops a response, the client never hears back, so it retries. Now the same operation runs twice. If that operation was charging a credit card, you have a furious customer. Idempotency is the property that running the same request twice has the same effect as running it once. It is not optional at scale, it is the thing that makes retries safe, and retries are unavoidable.

Timeouts are the other half of surviving failure. Connection timeout caps how long you wait to establish a connection, request timeout caps how long you wait for the answer. Without them, one slow dependency can hold every thread hostage and cascade into a full outage. Setting them too tight causes false failures, too loose and a sick service drags everything down with it. There is no universal number, only a trade-off you have to reason about per dependency.

Under all of this sit the data guarantees. ACID properties (atomicity, consistency, isolation, durability) are the strong promises a traditional database makes: a transaction either fully happens or fully does not, and once committed it stays committed. BASE (basically available, soft state, eventual consistency) is the looser model many distributed systems choose so they can stay available under partition and scale wide. SQL versus NoSQL is largely this same choice expressed as a database category. You pick based on whether your workload needs strict consistency and rich queries or whether it needs to scale horizontally and tolerate eventual consistency.

Contracts: APIs, schemas, and validation at the boundary

A system is only as reliable as the agreements between its parts. REST API is the dominant style for those agreements over HTTP, a set of conventions for how services expose resources and how clients talk to them. The value of a convention is predictability. Anyone who knows REST can pick up your API and guess how it works, which is why it became the default.

But a convention is not enough on its own. JSON Schema and XML Schema let you write down exactly what a valid message looks like, so both sides agree on shape and types before anything goes wrong. API documentation through Swagger and OpenAPI turns that contract into something humans and tools can read, generate clients from, and test against. Semantic versioning is how you evolve the contract without breaking everyone who depends on it: a clear rule for which changes are safe, which add features, and which break compatibility.

The boundary is also where security lives. Input validation means never trusting what comes in from outside, checking every field before it touches your logic, because attackers send malformed and malicious data on purpose. Output encoding means safely formatting what you send back so that data can never be misread as code, which is the core defense against injection attacks. Validate on the way in, encode on the way out. These two habits prevent an entire family of breaches and belong in your reflexes from day one.

Frequently asked questions

Learn Core Fundamentals the interactive way

All 26 lessons with step by step diagrams, runnable code, and quizzes. One payment of ₹299 in India or $5 worldwide. Lifetime access, no subscription.

Core Fundamentals

Performance: latency, throughput, and bandwidth are not the same thing

Scaling: vertical, horizontal, and the statelessness that makes it possible

Correctness: idempotency, timeouts, and the data guarantees behind them

Contracts: APIs, schemas, and validation at the boundary

Frequently asked questions

Core Fundamentals

Performance: latency, throughput, and bandwidth are not the same thing

Scaling: vertical, horizontal, and the statelessness that makes it possible

Correctness: idempotency, timeouts, and the data guarantees behind them

Contracts: APIs, schemas, and validation at the boundary

All 26 lessons in Core Fundamentals

Frequently asked questions

Learn Core Fundamentals the interactive way

Core Fundamentals

Performance: latency, throughput, and bandwidth are not the same thing

Scaling: vertical, horizontal, and the statelessness that makes it possible

Correctness: idempotency, timeouts, and the data guarantees behind them

Contracts: APIs, schemas, and validation at the boundary

All 26 lessons in Core Fundamentals

Frequently asked questions

Learn Core Fundamentals the interactive way