Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

5 US dollars for lifetime access globally, or 299 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of 5 dollars instead of annual subscriptions costing 100 to 200 dollars per year.

How do you prevent overselling during a flash sale?

Never read-then-write inventory in application code. Use an atomic conditional decrement — UPDATE inventory SET available_qty = available_qty - 1 WHERE sku_id = ? AND available_qty >= 1 — and check whether the row was actually updated; or, for a single hot SKU, serialize all decrements through one atomic counter (Redis/Aerospike DECR) or a single-partition queue so the database sees an ordered stream. Combine with a reservation TTL so stock held for an unpaid checkout is auto-returned. The principle: always prefer rejecting a request ('sold out') over selling the same unit twice.

Why use a saga instead of a single database transaction for checkout?

Checkout spans inventory, payment, and order services with separate data stores. A single ACID transaction or two-phase commit across them creates distributed locks that don't scale and collapse under Big Billion Days write load. A saga runs each step as a local transaction with a compensating action (release the reservation if payment fails, refund if fulfillment fails). Flipkart open-sourced Flux, a state-machine framework that implements exactly this for order fulfillment. The tradeoff is no isolation — intermediate states like 'pending payment' are visible — so every step must be idempotent.

How do you make checkout idempotent so users aren't double-charged?

The client sends an idempotency key per checkout attempt and resends the same key on every retry. The orders table has a UNIQUE constraint on that key, so the first insert wins and retries return the existing order instead of creating a new one. Carry the same key into the payment call so the payment provider also dedupes (authorize-then-capture, capture once). For multi-step order workflows, derive each step's key from the saga id plus step name so a crashed orchestrator can replay a step as a no-op.

How does Flipkart handle the read-heavy catalog vs. the write-heavy checkout?

By splitting them (CQRS). The read path — browse, search, product pages — is served from a CDN plus a fast KV cache (Flipkart uses Aerospike at ~90M QPS aggregate, sub-millisecond latency) and a search index, with eventual consistency and aggressive caching. The write path — cart, inventory reservation, checkout — uses strongly-consistent stores and idempotent, serialized operations. Changes flow from the transactional source of truth to the caches and search index via a Kafka change stream.

What actually happens to traffic during Big Billion Days, and how do you plan for it?

Traffic runs roughly 5-10x normal. You can't rely on reactive autoscaling alone because flash sales spike in a single second and the bottleneck is often one un-scalable hot inventory key. So you load-test against synthetic BBD traffic, pre-warm caches and pre-scale ahead of the sale, put a virtual waiting room and rate limits in front of checkout, define a degradation ladder (shed recommendations and reviews before touching the browse->cart->checkout money path), and use circuit breakers plus chaos testing so failover is proven. Flipkart runs this on its Kubernetes-based Flipkart Cloud Platform, bursting to GCP for peaks.

How is inventory kept consistent across multiple warehouses?

Model inventory per SKU per warehouse (composite key) so geo-routing can fulfill from the nearest stock and each warehouse's count is independently consistent. A reservation picks a specific warehouse's stock to decrement. Cross-warehouse views (total available) are computed as an eventually-consistent aggregate for display, but the authoritative decrement always happens against one warehouse row, so there's a single serialization point per (SKU, warehouse) and no oversell.

System Design Interview Guide

Design Flipkart: System Design Interview Guide

During Big Billion Days, Flipkart traffic runs 5-10x its normal load, and its Aerospike fleet alone serves ~90 million queries per second at sub-millisecond latency across search, pricing, ads, and inventory.

A complete system design walkthrough of an e-commerce platform like Flipkart, built for India's Big Billion Days. We cover the read-heavy catalog and search path, the write-heavy checkout path, how inventory stays consistent without overselling during flash sales, cart reservation with TTLs, the order-management state machine (Flipkart's open-source Flux), and how to absorb a thundering herd with virtual waiting rooms, rate limits, and queues.

Asked at: Asked at Flipkart, Amazon, Walmart Global Tech, Myntra, Meesho, Swiggy, Zomato, PhonePe, and most product-company interviews in India. "Design an e-commerce platform / flash sale / Big Billion Days" is one of the most common SDE2/SDE3 prompts because it exercises caching, consistency, and high-write contention in a single question.

Why this question is asked

E-commerce is the canonical interview problem because a single question forces you to reconcile two opposing workloads: a massively read-heavy catalog (which wants aggressive caching and eventual consistency) and a write-heavy, contention-prone checkout (which demands strong consistency on inventory so you never sell the same unit twice). The flash-sale angle adds a thundering-herd dimension — millions of users hitting one SKU in the same second — which separates candidates who can recite CAP from candidates who can actually shape load. Interviewers use it to probe whether you understand idempotency, distributed transactions vs. sagas, cache invalidation, and graceful degradation under 10x traffic.

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Browse and search a product catalog of 100M+ listings with filters (category, brand, price, rating), facets, and autocomplete
View a product detail page (PDP) with price, ratings, seller, and live availability
Add items to cart; cart persists across sessions and devices
Place an order: address, payment, and a confirmed order id, with no double-charging on retries
Decrement inventory atomically so a unit is never sold twice (no oversell), even under flash-sale contention
Run time-boxed flash sales / deals where a limited-stock SKU opens at a fixed time
Track order lifecycle: created -> payment authorized -> packed -> shipped -> delivered, with cancellations and refunds
Show personalized recommendations and category/deal pages
Let sellers manage listings, price, and stock; reflect stock changes in near real-time

Non-functional requirements

Catalog read path p99 < 150ms; search results p99 < 300ms
Withstand 5-10x baseline traffic during Big Billion Days without manual intervention
Strong consistency on the inventory ledger (never oversell); eventual consistency acceptable for catalog, reviews, recommendations
Checkout must be idempotent: a retried/duplicate request produces exactly one order and one charge
High availability (target 99.95%+) for browse/search; checkout may degrade gracefully (queue) rather than fail
Horizontal scalability — every tier scales independently (search, cart, inventory, orders)
Durability: orders, payments, and inventory mutations are never lost (replicated, persisted before ack)
India-first latency: serve from in-country data centers / CDN edge close to Tier-1 and Tier-2 cities

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Registered users / monthly actives

~500M registered, ~150-200M MAU (est.)

Flipkart is one of India's two largest e-commerce platforms; public figures put the user base in the hundreds of millions. Treat exact MAU as an estimate for sizing.

Catalog size

100M+ active listings

A horizontal marketplace with millions of sellers easily reaches 9-figure SKU counts across categories. Drives the search index size and catalog cache footprint.

Baseline vs. BBD traffic

5-10x spike during Big Billion Days

Stated directly in Flipkart engineering interviews/case studies — sale traffic is 5x to 10x normal. This multiplier is the central capacity-planning input.

Read:write ratio on catalog

~100:1 to 1000:1 reads:writes

A PDP is viewed millions of times between price/stock changes. Justifies heavy caching and a separate read path from the write path.

Aerospike fleet QPS

~90M QPS aggregate, sub-ms latency

Per Aerospike's published Flipkart case study: ~90M QPS across three DCs and 50+ use cases (search bar, recommendations, ads, pricing, inventory) at sub-millisecond latency, 200+ clusters.

Flash-sale contention on one SKU

100K-1M+ requests/sec on a single hot key

A limited-stock deal (e.g., a phone) draws the whole audience to one product+inventory key in the same second — the classic thundering-herd / hot-key problem to design around.

Checkout write throughput (peak)

Tens of thousands of order writes/sec

Even a small fraction of browsers converting at peak generates a large, contention-heavy write load on the order and inventory stores.

High-level architecture

Start by splitting the system along the read/write seam, because that split drives every other decision. The read path serves browse, search, and product detail pages; it is enormous in volume but tolerant of staleness, so it leans on a CDN, multiple cache layers, and a search index. The write path serves cart, inventory reservation, checkout, and order management; it is smaller in volume but unforgiving — it must be strongly consistent and idempotent. On the read path, a request from a browser or app first hits a CDN edge (static assets, images, and cacheable PDP fragments). Dynamic requests pass through an API gateway / load balancer into stateless service tiers. Catalog reads are served from a product service backed by a fast key-value store (Flipkart famously runs Aerospike here, serving the search bar, pricing, recommendations, ads, and inventory reads at ~90M QPS aggregate with sub-millisecond latency). Search and faceted filtering hit a dedicated inverted-index cluster (Elasticsearch/Solr-class), kept in sync from the catalog via a change stream. The source of truth for product data lives in a durable store and is denormalized into these read-optimized projections — classic CQRS. On the write path, adding to cart writes to a cart service (a low-latency KV store keyed by user). Checkout is where the interesting consistency work happens: the system reserves inventory against a strongly-consistent inventory ledger (an atomic decrement with a reservation TTL), creates an order in a "pending" state, then drives the order through a distributed workflow — payment authorization, inventory commit, packing, shipping — using a saga / state machine. Flipkart open-sourced exactly this: Flux, a state-machine orchestration framework that models order fulfillment as states (Order Created, Confirmed, Packed, Shipped) with event-driven transitions, retries, and replay. Each step is idempotent (keyed by an idempotency key derived from the order/saga id), and failed steps trigger compensating actions rather than a single big ACID transaction across services. For Big Billion Days, the architecture adds shock absorbers in front of the write path. Because traffic runs 5-10x normal and flash sales concentrate millions of users on a single SKU, you put a virtual waiting room / admission control in front of checkout, rate-limit per user and per SKU, and front the hot inventory key with an in-memory counter and a queue so the database sees a smooth, bounded write rate instead of a spike. Flipkart runs all of this on its own Flipkart Cloud Platform (FCP) over Kubernetes, bursting overflow workloads to GCP during peaks, with chaos testing and autoscaling so individual microservices scale independently based on demand.

In a real interview, sketch this on the whiteboard before diving into any single box.

Core components

Walk through each service. The interviewer wants to hear what each one owns, not just the names.

API Gateway / Load Balancer

Entry point for all client traffic. Terminates TLS, authenticates, routes to service tiers, and enforces global and per-user rate limits. During sales it also hosts admission control (virtual waiting room tokens) so the backend never sees more than it can serve. Stateless and horizontally scaled behind GSLB/DNS for geo-routing within India.

Catalog / Product Service

Serves product detail data (title, price, attributes, seller, availability summary). Backed by a fast KV store (Aerospike-class) holding the read projection. The durable source of truth is a separate transactional store; updates flow into the KV cache and search index via a change stream. Read-only and aggressively cached — this is the 100:1+ read side.

Search & Discovery Service

Inverted-index cluster (Elasticsearch/Solr-class) powering keyword search, facets, filters, sort, and autocomplete over 100M+ listings. Kept in near-real-time sync with the catalog through CDC/Kafka. Personalization and ranking signals are layered on top; recommendations are precomputed offline (the Flipkart Data Platform / 35PB Hadoop side) and served from a low-latency store.

Cart Service

Stores per-user carts in a low-latency KV store keyed by user id, replicated for durability across devices. Holds item references and quantities but does NOT hold inventory — adding to cart is a soft intent, not a reservation. Cart TTLs keep abandoned carts from leaking storage.

Inventory Service (the consistency core)

Owns the strongly-consistent stock ledger per SKU per warehouse. Exposes atomic reserve/commit/release operations with a reservation TTL. A reserve decrements available stock and creates a short-lived hold; commit finalizes on payment success; release returns stock on timeout/cancel. This is where overselling is prevented — every other component treats its number as advisory until reserve succeeds.

Order Management Service (OMS)

Creates orders and drives them through their lifecycle with a state machine / saga orchestrator (Flipkart's Flux: Order Created -> Payment Authorized -> Inventory Committed -> Packed -> Shipped -> Delivered). Each transition is idempotent and durably persisted; failures trigger compensating actions (refund, release inventory) instead of leaving partial state.

Payment Service

Integrates UPI, cards, net banking, wallets, EMI, and cash-on-delivery. Uses idempotency keys so a retried checkout never double-charges, and a two-phase pattern (authorize then capture) so money is only captured once inventory is committed. Talks to external PSPs/banks via webhooks; reconciliation handles async settlement.

Flash-Sale Admission & Hot-Key Layer

In front of the inventory service for deal SKUs: an in-memory atomic counter (Redis/Aerospike) acts as a fast gate so the database isn't hit by the full herd; a FIFO queue smooths bursts into a bounded write rate; per-user and per-SKU rate limits and bot detection shed abusive load. Users beyond capacity get a 'sold out / try again' fast path instead of a hung request.

CDN & Edge

Caches images, static assets, and cacheable PDP fragments at edge locations close to users. Offloads the bulk of bytes from origin and is the first line of defense against a traffic spike — most of a sale page is static.

Event Bus (Kafka)

Backbone for change streams and async workflows: catalog/price/stock changes propagate to caches and search, order events drive notifications and analytics, and inventory mutations are logged for audit. Decouples producers from consumers so a slow downstream never blocks checkout.

Notification Service

Sends order confirmations, shipping updates, and deal alerts over email, SMS, push, and in-app channels. Driven off the event bus, rate-limited, and deduplicated so a retry storm doesn't spam users.

Data model

Pick the right store per table. Justify each choice with the access pattern, not by reflex.

products

product_id (PK)titlebrandcategory_iddescriptionattributes (jsonb)default_pricestatus

Source-of-truth catalog row. Mutated rarely relative to reads. Denormalized into the product KV cache and the search index via CDC. Keep price out of the heavily-cached blob if it changes often, or version it.

listings

listing_id (PK)product_id (FK)seller_idpricemrpwarehouse_idis_active

A marketplace has many sellers per product. The buy-box / lowest-price selection happens over listings. Separating listings from products keeps the catalog row stable while prices churn per seller.

inventory

sku_id (PK)warehouse_id (PK)available_qtyreserved_qtyversion

The consistency core. available_qty is the only number that matters for oversell. Use an optimistic version column (compare-and-swap) or a serialized atomic decrement. Composite key by warehouse so geo-routing can ship from the nearest stock.

inventory_reservations

reservation_id (PK)sku_iduser_idqtystatus (held/committed/released)expires_at

Short-lived holds created at checkout start. A background sweeper (or TTL) releases expired holds back to available_qty. This is what lets you reserve before payment without permanently losing stock to abandoners.

carts

user_id (PK)items (jsonb: listing_id, qty)updated_atttl

Stored in a KV store, not a relational table at scale. Holds intent only — no stock is reserved here. TTL evicts abandoned carts.

orders

order_id (PK)user_idstatustotal_amountidempotency_key (UNIQUE)created_at

idempotency_key is UNIQUE so a duplicate/retried checkout maps to the same order instead of creating a second one. status is driven by the OMS state machine.

order_items

order_id (FK)sku_idlisting_idqtyunit_pricereservation_id

Line items snapshot the price at purchase time (never re-read current price) and link to the reservation that guaranteed the stock.

payments

payment_id (PK)order_id (FK)providerprovider_refstatus (authorized/captured/refunded)idempotency_key (UNIQUE)amount

Authorize-then-capture two-phase flow. idempotency_key prevents double-charge on retry. Reconciliation job matches provider webhooks against this table.

Deep dives

These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.

Preventing oversell: how to atomically decrement inventory under contention

The whole interview hinges on this. Never read-then-write in application code — two requests read available_qty=1, both think they can sell, both decrement: that's the oversell bug. Two correct patterns. (1) Atomic compare-and-swap with a version column: UPDATE inventory SET available_qty = available_qty - 1, version = version + 1 WHERE sku_id = ? AND available_qty >= 1 — the row count tells you if you won; this is optimistic concurrency and works great when contention is moderate. (2) For a single hot SKU under flash-sale contention, route all decrements for that key to one serialization point: an atomic counter in Redis/Aerospike (DECR returns the new value; reject if it goes below zero) or a single-partition queue so writes are serialized. The database then sees a bounded, ordered stream instead of a stampede. Pair either with a reservation TTL so stock held for a checkout that never pays is returned automatically. State clearly that you accept rejecting a request ('sold out') over overselling — in e-commerce, oversell is a refund, an angry customer, and sometimes a legal/SLA problem; under-serving is just a retry.

The thundering herd: absorbing a flash sale on one SKU

When a deal opens at noon, millions of users hit the same product+inventory key in the same second. You cannot let that reach the inventory database. Layer the defenses: (1) Admission control / virtual waiting room — issue queue tokens at the gateway; only N users at a time get a checkout slot, the rest see a waiting page with their position. (2) Per-user and per-SKU rate limits plus bot detection to shed scripted load. (3) Front the hot key with an in-memory counter so 99% of 'is there stock' checks are answered without touching the DB, and short-circuit to 'sold out' the instant the counter hits zero. (4) Smooth the remaining writes through a FIFO queue into a bounded consumer rate. (5) Cache the PDP itself hard at the CDN so the read flood doesn't melt origin either. The mental model: convert a spike into a queue, answer the impossible-to-satisfy 99% cheaply and instantly, and let only the winners through to the expensive consistent path.

Idempotent checkout: exactly one order, exactly one charge

Networks retry. Users double-click. Mobile apps replay requests after a flaky connection. Without idempotency you create duplicate orders and double-charge customers. The fix: the client generates an idempotency key per checkout attempt and sends it on every retry. The order table has a UNIQUE constraint on idempotency_key — the first insert wins, the retry hits the constraint and you return the already-created order instead of making a new one. Carry the same key (or a derived one) into the payment call so the PSP also dedupes. For the multi-step order workflow, derive each step's idempotency key from the saga id + step name (Flipkart's Flux pattern), so if the orchestrator crashes and replays mid-flight, re-executing a step is a no-op rather than a second inventory decrement or a second capture.

Distributed transactions: saga / state machine instead of 2PC

Checkout touches inventory, payment, and orders — three services, three data stores. A single ACID transaction (or two-phase commit) across them doesn't scale and creates locks that kill you at BBD load. Use a saga: a sequence of local transactions, each with a compensating action. Reserve inventory -> authorize payment -> create order -> commit inventory -> capture payment. If payment fails after reserve, the compensation releases the reservation. Flipkart built and open-sourced Flux for exactly this: order fulfillment modeled as a state machine (Order Created -> Confirmed -> Packed -> Shipped) with event-driven transitions, durable persistence, retries, and replay. Orchestration (a central coordinator drives the steps) is usually preferred over choreography here because order flows are complex and you want one place to see and recover state. Be explicit that sagas give you eventual consistency with no isolation — design for intermediate states being visible (e.g., an order that's 'pending payment') and make every step idempotent so replay is safe.

Read path: catalog caching, CQRS, and cache invalidation

The catalog is read ~100-1000x more than it's written, so separate the read model from the write model (CQRS). The source of truth is a transactional store; reads are served from a denormalized KV cache (Aerospike-class) and a search index. Updates flow one way: write to source of truth -> emit a change event on Kafka -> update KV cache and reindex search. The hard part is cache invalidation, especially for price and stock. Strategies: (1) version every product so a stale read can be detected; (2) for stock, do NOT show the exact DB number on the PDP under a flash sale — show 'in stock / few left / sold out' buckets so a slightly stale cache is harmless and you avoid hammering inventory for every page view; (3) use short TTLs plus event-driven invalidation (TTL is your safety net if an invalidation event is missed). Also guard against cache stampede: when a hot key expires, thousands of requests miss simultaneously — use request coalescing (single-flight) or a slightly randomized TTL so they don't all expire at once.

Cart vs. reservation: why add-to-cart must NOT reserve stock

A common wrong answer is to decrement inventory when a user adds to cart. At BBD scale that's catastrophic: millions of carts would lock up all stock, most of which is never bought, and you'd show 'sold out' to people willing to pay. Cart is intent, not commitment. Stock is only reserved at the start of checkout, with a short TTL (a few minutes), and only committed on successful payment. This keeps inventory liquid. The tradeoff is honesty: between cart and checkout the price or availability can change, so you re-validate price and re-check availability at checkout and surface any change to the user before charging. This is also why the PDP availability is a fuzzy bucket and the cart shows 'we'll confirm availability at checkout' rather than a hard guarantee.

Scaling for Big Billion Days: capacity, autoscaling, and graceful degradation

BBD traffic runs 5-10x baseline, so you can't just turn on autoscaling and hope. Flipkart runs on its own Flipkart Cloud Platform (FCP) over Kubernetes and bursts to GCP for peaks; microservices scale independently so the search tier and the checkout tier flex on their own curves. Practices to mention: (1) load-test against synthetic BBD traffic and pre-warm caches and pre-scale ahead of the sale start (cold autoscaling can't react fast enough to an instantaneous spike). (2) Define a degradation ladder: under extreme load, shed non-critical features first (recommendations, reviews, 'people also viewed') to protect the money path (browse -> cart -> checkout). (3) Use circuit breakers and backpressure so a slow downstream (a struggling payment provider) trips fast and the system queues rather than cascades into failure. (4) Run chaos testing in production (kill pods, simulate partial failures) so failover is proven, not hoped for. The thesis: at 10x you don't keep everything up — you choose what to sacrifice in advance.

Trade-offs to discuss

Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.

Strong consistency on inventory, eventual consistency on catalog

Overselling a unit is a real-money, real-customer failure, so the inventory ledger gets strong consistency and serialized decrements. The catalog is read-dominated and tolerant of staleness, so it gets cached aggressively and updated eventually. Applying strong consistency everywhere would not scale; applying eventual consistency to inventory would oversell. Pick consistency per-domain, not globally.

Saga / state machine over two-phase commit for checkout

2PC across inventory, payment, and order services creates distributed locks that collapse under BBD write contention. Sagas trade away isolation (intermediate states are visible) for availability and scale, with compensating actions for rollback. The cost is more application-level complexity and the need to make every step idempotent.

Reserve stock at checkout, not at add-to-cart

Reserving at cart locks up inventory that's mostly never purchased, showing false 'sold out' to real buyers. Reserving at checkout with a short TTL keeps stock liquid. The tradeoff is that price/availability can shift between cart and checkout, so you must re-validate at checkout and accept occasionally telling a user the item just went out of stock.

Virtual waiting room / queue instead of pure autoscaling for flash sales

Autoscaling reacts in tens of seconds; a flash sale spikes in one second, and the bottleneck is a single hot inventory key that can't be scaled horizontally anyway. Admission control converts the spike into an orderly queue and answers the 99% who can't win instantly and cheaply. The cost is a worse experience for some users (waiting page) — but it's a controlled, fair degradation instead of a meltdown.

Fuzzy availability buckets on the PDP ('few left') instead of exact stock counts

Showing the exact live count would force every page view to read the consistent inventory store, defeating the cache and creating a hot key on reads, not just writes. Buckets make a slightly stale cache harmless. The tradeoff is precision — you can't promise a specific quantity until checkout reserves it.

Authorize-then-capture payments with idempotency keys

Capturing money before inventory is committed risks charging for stock you can't fulfill; idempotency keys prevent double-charge on retries. The two-phase flow adds latency and a reconciliation job for async settlement, but it's the only safe way to keep money and stock in agreement across a distributed checkout.

Separate read store (KV + search index) from write store (transactional DB) — CQRS

One store can't be both a sub-millisecond 90M-QPS read cache and a strongly-consistent transactional ledger. Splitting them lets each scale on its own terms. The cost is the synchronization machinery (CDC/Kafka) and the eventual-consistency window between a write and when reads see it — acceptable for catalog, designed-around for stock.

How Flipkart actually does it

Flipkart runs much of its low-latency serving layer on Aerospike: per Aerospike's published case study, ~90 million QPS aggregate across three data centers and 50+ use cases — the homepage search bar, recommendations, ads, pricing, and inventory — at sub-millisecond latency, on 200+ production clusters managed by a team of fewer than ten engineers via the Aerospike Kubernetes Operator. The platform sits on Flipkart Cloud Platform (FCP), Flipkart's internal Kubernetes-based cloud over private India data centers, with hybrid bursting to GCP during peak events; microservices scale independently and the team runs continuous chaos testing (killing pods, simulating partial failures) to prove failover. For order orchestration, Flipkart open-sourced Flux, a state-machine framework that models fulfillment as states and event-driven transitions with retries and replay — a real, inspectable implementation of the saga pattern this page describes. Big Billion Days traffic is publicly described as 5-10x normal load, and the Flipkart Data Platform runs an 800+ node, 35PB+ Hadoop cluster powering the offline recommendation and analytics side. Note that some specific QPS and percentage figures come from vendor case studies and engineering talks rather than first-party SLAs; treat them as directionally accurate orders of magnitude, not contractual numbers.

Sources

Lessons to study before this interview

If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.

Design a Payment System

capstone / capstone

Design a Rate Limiter

capstone / capstone

Idempotency

foundation / core fundamentals

Saga Pattern

advanced / distributed systems core

Distributed Transactions

advanced / distributed systems core

Cache Stampede Prevention

foundation / caching strategies

Cache Invalidation

foundation / caching strategies

Frequently asked questions

Practice with 766 system design lessons

Lifetime access for INR 299 or $5. Interactive diagrams, runnable code, quizzes, and 20 capstone projects including Design Flipkart.

Design Flipkart: System Design Interview Guide

Why this question is asked

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Browse and search a product catalog of 100M+ listings with filters (category, brand, price, rating), facets, and autocomplete
View a product detail page (PDP) with price, ratings, seller, and live availability
Add items to cart; cart persists across sessions and devices
Place an order: address, payment, and a confirmed order id, with no double-charging on retries
Decrement inventory atomically so a unit is never sold twice (no oversell), even under flash-sale contention
Run time-boxed flash sales / deals where a limited-stock SKU opens at a fixed time
Track order lifecycle: created -> payment authorized -> packed -> shipped -> delivered, with cancellations and refunds
Show personalized recommendations and category/deal pages
Let sellers manage listings, price, and stock; reflect stock changes in near real-time

Non-functional requirements

Catalog read path p99 < 150ms; search results p99 < 300ms
Withstand 5-10x baseline traffic during Big Billion Days without manual intervention
Strong consistency on the inventory ledger (never oversell); eventual consistency acceptable for catalog, reviews, recommendations
Checkout must be idempotent: a retried/duplicate request produces exactly one order and one charge
High availability (target 99.95%+) for browse/search; checkout may degrade gracefully (queue) rather than fail
Horizontal scalability — every tier scales independently (search, cart, inventory, orders)
Durability: orders, payments, and inventory mutations are never lost (replicated, persisted before ack)
India-first latency: serve from in-country data centers / CDN edge close to Tier-1 and Tier-2 cities

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Registered users / monthly actives

~500M registered, ~150-200M MAU (est.)

Flipkart is one of India's two largest e-commerce platforms; public figures put the user base in the hundreds of millions. Treat exact MAU as an estimate for sizing.

Catalog size

100M+ active listings

A horizontal marketplace with millions of sellers easily reaches 9-figure SKU counts across categories. Drives the search index size and catalog cache footprint.

Baseline vs. BBD traffic

5-10x spike during Big Billion Days

Stated directly in Flipkart engineering interviews/case studies — sale traffic is 5x to 10x normal. This multiplier is the central capacity-planning input.

Read:write ratio on catalog

~100:1 to 1000:1 reads:writes

A PDP is viewed millions of times between price/stock changes. Justifies heavy caching and a separate read path from the write path.

Aerospike fleet QPS

~90M QPS aggregate, sub-ms latency

Per Aerospike's published Flipkart case study: ~90M QPS across three DCs and 50+ use cases (search bar, recommendations, ads, pricing, inventory) at sub-millisecond latency, 200+ clusters.

Flash-sale contention on one SKU

100K-1M+ requests/sec on a single hot key

A limited-stock deal (e.g., a phone) draws the whole audience to one product+inventory key in the same second — the classic thundering-herd / hot-key problem to design around.

Checkout write throughput (peak)

Tens of thousands of order writes/sec

Even a small fraction of browsers converting at peak generates a large, contention-heavy write load on the order and inventory stores.

How Flipkart actually does it

Frequently asked questions

Design Flipkart: System Design Interview Guide

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

API Gateway / Load Balancer

Catalog / Product Service

Search & Discovery Service

Cart Service

Inventory Service (the consistency core)

Order Management Service (OMS)

Payment Service

Flash-Sale Admission & Hot-Key Layer

CDN & Edge

Event Bus (Kafka)

Notification Service

Data model

Deep dives

Preventing oversell: how to atomically decrement inventory under contention

The thundering herd: absorbing a flash sale on one SKU

Idempotent checkout: exactly one order, exactly one charge

Distributed transactions: saga / state machine instead of 2PC

Read path: catalog caching, CQRS, and cache invalidation

Cart vs. reservation: why add-to-cart must NOT reserve stock

Scaling for Big Billion Days: capacity, autoscaling, and graceful degradation

Trade-offs to discuss

Strong consistency on inventory, eventual consistency on catalog

Saga / state machine over two-phase commit for checkout

Reserve stock at checkout, not at add-to-cart

Virtual waiting room / queue instead of pure autoscaling for flash sales

Fuzzy availability buckets on the PDP ('few left') instead of exact stock counts

Authorize-then-capture payments with idempotency keys

Separate read store (KV + search index) from write store (transactional DB) — CQRS

How Flipkart actually does it

Lessons to study before this interview

Frequently asked questions

How do you prevent overselling during a flash sale?

Should adding to cart reserve inventory?

Why use a saga instead of a single database transaction for checkout?

How do you make checkout idempotent so users aren't double-charged?

How does Flipkart handle the read-heavy catalog vs. the write-heavy checkout?

What actually happens to traffic during Big Billion Days, and how do you plan for it?

How is inventory kept consistent across multiple warehouses?

Practice with 766 system design lessons

Design Flipkart: System Design Interview Guide

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

API Gateway / Load Balancer

Catalog / Product Service

Search & Discovery Service

Cart Service

Inventory Service (the consistency core)

Order Management Service (OMS)

Payment Service

Flash-Sale Admission & Hot-Key Layer

CDN & Edge

Event Bus (Kafka)

Notification Service

Data model

Deep dives

Preventing oversell: how to atomically decrement inventory under contention

The thundering herd: absorbing a flash sale on one SKU

Idempotent checkout: exactly one order, exactly one charge

Distributed transactions: saga / state machine instead of 2PC

Read path: catalog caching, CQRS, and cache invalidation

Cart vs. reservation: why add-to-cart must NOT reserve stock

Scaling for Big Billion Days: capacity, autoscaling, and graceful degradation

Trade-offs to discuss

Strong consistency on inventory, eventual consistency on catalog

Saga / state machine over two-phase commit for checkout

Reserve stock at checkout, not at add-to-cart

Virtual waiting room / queue instead of pure autoscaling for flash sales