Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

5 US dollars for lifetime access globally, or 299 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of 5 dollars instead of annual subscriptions costing 100 to 200 dollars per year.

How does Swiggy decide which restaurants to show for my address?

It runs a serviceability check. Cities are divided into delivery-zone polygons, and Swiggy keeps an in-memory GeoHash index that maps each polygon to the GeoHash cells it overlaps. When you set an address, it computes your GeoHash cell, pulls the small set of zones tied to that cell, runs a precise point-in-polygon test on just those, and returns the restaurants in the matching zone — using actual road distance, not straight-line, to decide if a restaurant can really reach you. This keeps a read that runs tens of thousands of times per second under 100 ms.

Why doesn't Swiggy assign a delivery partner the moment I place the order?

Because the kitchen needs time to cook. If a partner were assigned immediately, they'd sit idle at the restaurant for the entire prep time — wasted supply, especially during the dinner peak when partners are the bottleneck. Swiggy delays assignment until the order is close to ready (using a predicted prep time) so the partner arrives roughly when the food does. Assignment then runs as a batched optimization across the zone every few seconds, not one order at a time.

How does Swiggy match orders to delivery partners at peak?

It's an optimization, not a nearest-partner lookup. The dispatch engine pools the ready-to-assign orders and available partners in a zone and solves a Mixed-Integer Program every few seconds: a cost matrix over (partner x batch) that minimizes total wait and cost across the whole zone, subject to bag capacity, one-batch-per-partner uniqueness, and time windows (arrive after the food's ready, before it's cold). Solving globally per zone beats greedy because it unlocks batching and balances load across many simultaneous orders.

How does order batching work and when does it kick in?

Batching combines two orders onto one trip — either two orders from the same restaurant to nearby drops, or orders from two nearby restaurants to nearby drops. It roughly halves the per-order delivery cost, which is critical during peaks. Inside a batch the system solves a tiny route-ordering (Vehicle Routing) problem to sequence pickups and drops. It only batches when every order still meets its freshness time window and caps batch size small (2-3), so it never sacrifices the first order's experience to save cost.

How does Swiggy handle the lunch and dinner rush?

That concentration is the defining challenge: roughly 70-80% of daily orders land in two ~90-minute windows, with dinner about 29% bigger than lunch in 2024. The system is sized for instantaneous peak (hundreds of orders per second), not the daily average (tens per second). Tactics include scheduled autoscaling ahead of the peak, leaning harder on batching to shed load, surge pricing to flatten demand at the edges and pull partners online, and graceful degradation — widening ETAs and temporarily pausing slammed kitchens rather than letting them build a 90-minute backlog.

Why is the order and payment system strongly consistent but the rest isn't?

Because the costs of staleness differ. Showing a slightly stale restaurant list or a tracking dot that's a second behind is harmless, so discovery, listings, and live tracking run on cached, eventually-consistent, horizontally-scaled read paths. But an order's state and its payment cannot be eventually consistent — that's where double charges, lost orders, and stuck states come from. So Swiggy isolates a small strongly-consistent transactional core (sharded SQL, atomic state transitions, idempotency keys on payment capture) inside a sea of eventually-consistent reads.

System Design Interview Guide

Design Swiggy: System Design Interview Guide

Swiggy serves 200,000+ restaurants across 580+ cities with a fleet of 390,000+ delivery partners, and dinner alone drove 215 million orders in 2024 — roughly 29% more than lunch — with demand crushed into two 90-minute windows a day.

Designing Swiggy means solving three coupled problems at once: serviceability (which restaurants can even reach this customer), dispatch (which delivery partner picks up which order, often batched), and a three-party order state machine that survives a restaurant rejecting an order, a partner going offline mid-trip, or a payment webhook arriving late. The hard part is that almost all of the load lands in two short meal peaks, so the system is sized for 4-5x its average and idle the rest of the day.

Asked at: Commonly asked at Swiggy, Zomato, Zepto, Uber, DoorDash, Grab, Rapido, Meesho, and most Indian product companies (Flipkart, PhonePe, Razorpay) for SDE2 and SDE3 rounds. It is the canonical hyperlocal logistics / geospatial matching problem for the Indian market, and a favorite because the lunch/dinner peak concentration forces a real capacity conversation.

Why this question is asked

Interviewers reach for Design Swiggy because a generic three-tier diagram falls apart immediately. You have three independent actors (customer, restaurant, delivery partner) whose actions interleave, a geospatial layer that has to answer "can this restaurant serve this address" in single-digit milliseconds, and an assignment problem that is genuinely an optimization (cost minimization under capacity and time-window constraints), not a lookup. On top of that, the traffic profile is brutal: 70-80% of a day's orders arrive in two narrow windows, so the candidate has to talk about peak provisioning, batching for efficiency, and graceful degradation when a city's kitchens are all slammed at 8:30 PM. It separates people who have only memorized a CDN diagram from people who can reason about a live multi-party system.

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Customer enters a delivery address; system returns only restaurants that can actually serve that address (serviceability)
Customer browses a menu, builds a cart, and places an order with online payment or cash on delivery
Restaurant receives the order on a partner tablet and accepts or rejects it within a short window
System assigns a delivery partner to pick up the order, batching nearby orders where it improves efficiency
Customer sees a live ETA and the delivery partner's location on a map once the order is picked up
Order moves through a strict state machine: placed, confirmed, food being prepared, ready, picked up, on the way, delivered (or canceled / refunded)
Delivery partner app shows the pickup, the drop, optimized route, and earnings for the trip
Customer can rate the restaurant and the delivery partner after delivery
Surge / dynamic delivery fee applied during peak demand when partner supply is tight

Non-functional requirements

Serviceability and restaurant-listing reads under 100 ms at p99 (this is the home-screen hot path)
Dispatch assignment decision within a few seconds of an order becoming ready-to-assign
Live tracking location updates every 4-5 seconds per active trip with sub-second delivery to the customer app
System sized for meal-peak load roughly 4-5x the daily average, concentrated in two 90-minute windows
Strong consistency on order state transitions and payment capture; no double charges, no lost orders
99.9%+ availability for order placement and tracking; degrade gracefully (e.g., wider ETA, fewer batches) rather than fail under peak
Geo-partitioned by city so a hot city does not starve others

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Daily orders

~6-7M/day

Reported public figures put Swiggy food delivery in the low single-digit millions of orders per day in 2024. Use ~6M as a round working number for capacity math; an interviewer cares about the method, not the exact figure.

Peak orders per second

~600-700 OPS

If 70% of ~6M daily orders land in roughly 3 hours of combined lunch+dinner peak, that is ~4.2M orders / ~10,800 s ≈ 390 OPS average across the peak, and bursts inside the peak push the instantaneous rate to ~600-700 OPS. Average over a full day is only ~70 OPS, which is the whole point: provision for the peak, not the mean.

Active delivery partners (peak concurrent)

~150K-200K

390,000+ total partners; peak concurrency around 40-50% during dinner. Each emits a location ping every 4-5 seconds while on an active trip or idle-but-online.

Location pings per second (peak)

~35K-45K/s

~180K online partners / 4.5 s ping interval. Written to an in-memory geo index, never to the durable order store at full fidelity.

Serviceability + listing reads

~50K-100K reads/s at peak

Every home-screen open and address change triggers a point-in-polygon serviceability check plus a restaurant-list fetch. This dwarfs order writes (~600 OPS) by two orders of magnitude, so it is the read path that must be cached aggressively.

Restaurants / cities

200K+ restaurants, 580+ cities

Public 2024 figures. Drives the size of the serviceability polygon index and the per-city sharding strategy.

High-level architecture

Mentally split Swiggy into four planes that talk through an event backbone (Kafka), because conflating them is where candidates lose the room. Plane 1 is the discovery / read path. A customer opens the app, the client sends a delivery lat/lng, and the Serviceability service answers "which restaurants can reach you." This is a point-in-polygon problem: cities are carved into delivery zones (polygons), and doing a raw PIP check against thousands of polygons per request would be too slow, so Swiggy builds a GeoHash index where each polygon is registered against the GeoHash cells it overlaps. At request time you compute the customer's GeoHash key, fetch the small candidate set of overlapping zones from memory, run the precise PIP only on those, and return the eligible restaurants. The catalog (menus, prices, ratings) is served from a search/listing store (think ElasticSearch + a heavy read cache) because this path runs at 50K-100K reads/s and must stay under 100 ms. Plane 2 is order placement and the state machine. The customer places an order; an Order service writes the order durably (Aurora/Postgres-class transactional store, sharded by city), captures payment idempotently, and emits an order.placed event. The restaurant's partner tablet, connected over a push channel, receives the order and accepts or rejects. Every transition (placed -> confirmed -> preparing -> ready -> picked_up -> on_the_way -> delivered, plus rejected/canceled/refunded branches) is a guarded finite-state-machine step persisted atomically, with each change appended to an order_events log for audit. This is the strongly-consistent core; it cannot be eventually consistent or you get double charges and lost orders. Plane 3 is dispatch — the genuinely hard, genuinely interesting part. Assignment does NOT fire greedily the instant an order is placed. The system waits until the order is close to ready (so the partner arrives just as the food is, minimizing both customer wait and partner idle time), pools the orders that are ready-to-assign in a city zone with the partners available nearby, and solves a Mixed-Integer Program every few seconds: a cost matrix of (partner x batch) where the objective minimizes total wait/cost across the zone, subject to capacity (a partner's bag/load limit), uniqueness (one batch to exactly one partner), and time windows (arrive after food is ready, before it gets cold). Batching — combining two orders from one restaurant going to nearby drops, or two nearby restaurants to nearby drops — is what makes peak economics work; it is a small Vehicle Routing Problem inside each batch. The ETA shown to the customer comes from a separate ML model (Swiggy's is a multi-output network that jointly predicts assignment delay, first-mile, kitchen wait, and last-mile). Plane 4 is live tracking. Once a partner is on a trip, their app pings location every few seconds into an in-memory geo store; a tracking service streams that to the customer app over WebSocket/push and recomputes ETA. Pings are never written at full fidelity to the order DB — they live in Redis-class memory and a downsampled trace goes to the data lake for analytics. Across all four planes, Kafka carries the events (order placed, accepted, assigned, picked up, delivered) that fan out to notifications, surge computation, analytics, and the partner-payout pipeline.

In a real interview, sketch this on the whiteboard before diving into any single box.

Core components

Walk through each service. The interviewer wants to hear what each one owns, not just the names.

Serviceability service

Answers 'which restaurants can deliver to this address' in under 100 ms. Cities are modeled as delivery zone polygons; a GeoHash index registers each polygon against the cells it overlaps so a request resolves to a tiny candidate set, then runs precise point-in-polygon only on those. Also computes road-distance (not straight-line) from restaurant to drop to decide eligibility and base delivery fee.

Catalog / listing + search

Serves restaurant lists, menus, prices, photos, and ratings. Read-heavy (50K-100K reads/s at peak), backed by ElasticSearch for search/filter and an aggressive cache (Redis/CDN) for the home feed. Personalization and ranking layered on top. Decoupled from the transactional order store.

Order service + state machine

The strongly-consistent core. Persists the order, captures payment idempotently, and enforces the finite state machine across all three actors. Each transition is atomic and appended to an immutable order_events log. Sharded by city; this is where double-charge and lost-order bugs live, so it gets the strictest consistency guarantees.

Restaurant partner channel

Push connection (long-lived socket / FCM fallback) to the restaurant tablet. Delivers new orders, collects accept/reject, and surfaces prep-time signals (the orders-placed vs orders-prepared ratio that feeds kitchen-stress into the ETA model). A reject triggers customer notification and refund flow.

Dispatch / assignment engine

Runs the optimization. Pools ready-to-assign orders and nearby available partners per zone and solves a Mixed-Integer Program every few seconds — cost matrix over (partner x batch), minimizing total cost/wait under capacity, uniqueness, and time-window constraints. Owns just-in-time assignment timing and order batching (a small VRP per batch).

Geo / location index

In-memory store of every online partner's live position, keyed by GeoHash/H3 cell so proximity queries are cheap. Ingests 35K-45K pings/s at peak via Kafka. Feeds both dispatch (find nearby idle partners) and tracking. Never durably persisted at full fidelity.

ETA prediction service

ML model that predicts the five legs of delivery time — O2A (order-to-assignment), first-mile, kitchen wait, last-mile, totaling order-to-reach — jointly via a multi-input/multi-output network. Features: restaurant type, item count/complexity, live kitchen stress, partner availability in the zone, historical road speeds, live GPS. Optimizes not just MAE but the rate of jarring ETA 'bumps' shown to the customer.

Live tracking service

Streams partner location and live ETA to the customer app over WebSocket/push at 4-5 s cadence. Reads from the in-memory geo index, not the order DB. Handles GPS gaps and partner disconnects by holding the last known position and widening the ETA.

Surge / pricing service

A streaming job (Flink-class) aggregates demand (orders in a zone) vs supply (available partners in a zone) per minute per GeoHash cell, publishes a smoothed, clamped delivery-fee multiplier to a cache, and the order service reads it at checkout. Locked in at order time so the customer pays what they agreed to.

Notification + event backbone

Kafka carries order.placed, .confirmed, .assigned, .picked_up, .delivered events that fan out to push notifications (customer + restaurant + partner), surge computation, partner-payout accrual, and the analytics lake. Decouples slow side-effects from the synchronous order path.

Data model

Pick the right store per table. Justify each choice with the access pattern, not by reflex.

restaurants

restaurant_id (PK)namelatlngcity_idzone_idprep_time_p50_minutesis_onlineratinggeohash

Listing/discovery data. Lives in the catalog store + search index, heavily cached. geohash and zone_id let serviceability narrow the candidate set fast. is_online flips frequently (kitchen busy, closed, out of an item).

delivery_zones

zone_id (PK)city_idpolygon (geometry)geohash_cells[]is_active

The serviceability polygons. polygon is the precise boundary used for point-in-polygon; geohash_cells is the denormalized list of cells the polygon overlaps, used to build the in-memory GeoHash index that avoids scanning every polygon per request.

orders

order_id (PK)customer_idrestaurant_idzone_idstateitems (jsonb)subtotal_centsdelivery_fee_centssurge_multiplierpayment_idplaced_atdelivered_at

The transactional core. Sharded by city_id / zone_id. state is the current FSM state. Strong consistency required. Append-mostly; mutations are state transitions, each mirrored into order_events.

order_events

event_id (PK)order_id (FK)from_stateto_stateactor (customer|restaurant|partner|system)reasoncreated_at

Immutable audit log of every state transition. Lets you reconstruct exactly who did what when — essential for disputes, refunds, and replaying state if the orders row is ever corrupted.

delivery_partners

partner_id (PK)namevehicle_typeis_onlinecurrent_zone_idbag_capacityrating

Static-ish partner profile. Live position is NOT here — it lives in the in-memory geo index. bag_capacity feeds the dispatch capacity constraint. is_online and current_zone_id scope the candidate pool for assignment.

partner_locations

partner_id (PK)latlngheadinggeohashupdated_at

In-memory only (Redis/custom geo service), keyed by geohash so proximity queries are O(small). Overwritten every 4-5 s. A downsampled trace (one point per ~30 s, completed trips only) is logged to the data lake; idle pings are discarded on expiry.

assignments

assignment_id (PK)partner_idbatch_idorder_ids[]assigned_atpickup_etadrop_etastatus

Output of the dispatch optimizer. A batch can hold multiple orders (the batching case). status tracks accepted/picked_up/completed/reassigned. Reassignment (partner cancels / goes unreachable) creates a new assignment row and re-enters the order into the next optimization round.

payments

payment_id (PK)order_id (FK)amount_centsmethod (upi|card|cod|wallet)statusidempotency_keyprovider_ref

Strongly consistent. Indexed by order_id and idempotency_key so a retried capture never double-charges. UPI dominates in the Indian market; COD bypasses pre-capture and reconciles on delivery. Provider webhooks update status asynchronously.

Deep dives

These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.

Serviceability: answering 'can this restaurant reach me' in under 100 ms

This is the home-screen hot path and runs at 50K-100K reads/s, far above order writes, so it must be fast and cacheable. Cities are carved into delivery zone polygons. The naive approach — run point-in-polygon for the customer's location against every polygon — is O(zones) per request and dies at scale. Swiggy's published approach builds a GeoHash index: pick a GeoHash resolution, and for each zone, register it against every GeoHash cell its polygon overlaps. At request time you compute the customer's GeoHash cell key, fetch from memory the small set of zones associated with that cell, then run precise PIP only on that handful. From the matched zone(s) you do a directional discovery of restaurant clusters and return the eligible list. Two refinements interviewers love: (1) eligibility uses actual road distance the partner will travel, not straight-line haversine, because a river or highway can make a 1 km haversine into a 6 km drive; (2) the whole result is cacheable per GeoHash cell for a short TTL, so two customers in the same cell share the computation.

Just-in-time dispatch and the assignment optimization

The single most common mistake is assigning a partner the instant the order is placed. If the kitchen needs 20 minutes and you assign immediately, the partner sits idle at the restaurant for 18 minutes — wasted supply during a peak when supply is the bottleneck. So assignment is deliberately delayed until the order is close to ready, using the predicted prep time. When a pool of orders in a zone becomes ready-to-assign, the engine builds a cost matrix over (available partner x candidate batch) and solves a Mixed-Integer Program roughly every few seconds. The objective minimizes total cost / wait across the zone (a weighted sum of partner idle time, customer wait, and travel cost). Constraints: capacity (a partner can't carry more than their bag/load limit), uniqueness (a batch goes to exactly one partner, a partner gets at most one batch this round), and time windows (arrive after the food is ready but before it cools). This is solved with a MIP/linear-sum-assignment solver, not a greedy nearest-partner loop, because greedy is locally fine but globally leaves money on the table across a zone with hundreds of simultaneous orders. The key talking point: batch-and-optimize per zone every few seconds, don't decide per-order in isolation.

Order batching and the Vehicle Routing Problem inside it

Batching is how Swiggy makes peak economics work — one trip carrying two orders roughly halves the per-order delivery cost. Two batchable cases: (a) two orders from the SAME restaurant going to nearby drops (pick up once, drop twice), and (b) orders from two NEARBY restaurants to nearby drops (pick up twice, drop twice). Once a batch is formed, you have a small Vehicle Routing / TSP problem: in what sequence does the partner visit the pickups and drops to minimize total time without letting the first order get cold while fetching the second? Constraints make it tractable: batch sizes are tiny (2, occasionally 3), so you can brute-force the few route permutations in milliseconds. The danger is over-batching during a peak: stuffing three orders on one partner to save cost can blow the ETA on the first order and tank the customer experience. So batching is gated by the same time-window constraint as assignment — a batch is only valid if every order in it still meets its freshness window.

Decomposed ETA prediction (and why a single number is wrong)

A naive ETA predicts one number end-to-end. Swiggy decomposes delivery time as Max(assignment_delay + first_mile, prep_time) + last_mile — the Max captures that the kitchen cooking and the partner riding to the restaurant happen IN PARALLEL, so the binding constraint is whichever finishes later. Concretely the model predicts five interdependent legs: O2A (order-to-assignment), first-mile (partner to restaurant), kitchen wait, last-mile (restaurant to customer), summing to order-to-reach. These are trained jointly with a multi-input/multi-output network rather than five separate models, because the legs are coupled — the dispatch engine deliberately times assignment so the partner arrives as food is ready, so O2A and first-mile depend on each other. Features include restaurant type (cloud kitchen vs dine-in), item count and prep complexity, live kitchen stress (orders placed vs prepared ratio), partner availability in the zone, historical road speeds around the locations, and live GPS pings. Beyond minimizing mean absolute error, the team explicitly tracks 'inaccurate bumps' — sudden ETA jumps that make the customer anxious — because a stable-but-slightly-wrong ETA beats a jittery one.

The three-party order state machine and failure branches

The order FSM has a clean happy path — placed -> confirmed (restaurant accepts) -> preparing -> ready -> picked_up -> on_the_way -> delivered — but the interview is really about the unhappy branches, because three independent humans can each break the flow. Restaurant rejects (out of an item, too busy): order goes to rejected, payment auto-refunds, customer is notified, and if it was a batch, the batch is re-optimized. No partner accepts / assigned partner cancels: the order re-enters the next dispatch round; after N failures or a timeout, escalate (surge the fee, widen the radius, or cancel with refund). Partner goes unreachable mid-trip (phone dies, GPS gap): tracking holds last-known position and widens ETA; if no ping for a threshold, ops/auto-reassign kicks in. Payment webhook arrives late: the order can sit in a pending_payment state with a timeout, and idempotency keys ensure a delayed-then-retried capture never double-charges. Every transition is guarded (you can't go from preparing straight to delivered) and written atomically with an entry in order_events, so the system can always answer 'what state is this order in and how did it get there.'

Surviving the lunch/dinner peak (the part most candidates skip)

This is the requirement that makes Swiggy distinct from Uber. ~70-80% of a day's orders land in two ~90-minute windows; dinner peak in 2024 was ~29% larger than lunch. Daily-average sizing (~70 OPS) is irrelevant — you must provision for instantaneous peak (~600-700 OPS) and a partner pool that's 40-50% concurrent. Concrete tactics: (1) Autoscale the stateless order, listing, and tracking services ahead of the peak on a schedule, not reactively, because reactive scaling lags the 8 PM cliff. (2) Lean on batching harder during peak — it's both an efficiency win and a load-shedding lever, because each batch is one trip instead of two. (3) Apply surge to flatten demand at the edges of the peak and pull more partners online. (4) Degrade gracefully under extreme load: widen ETAs, temporarily mark slammed kitchens unavailable rather than letting them accumulate a 90-minute backlog, and shed non-critical work (defer analytics, recommendations) to protect the order path. (5) The dispatch optimizer's run cadence and zone size are tunable knobs — smaller zones and faster rounds during peak. The honest framing for an interviewer: this system is overprovisioned and idle most of the day, and that's an accepted cost of hyperlocal food delivery.

Trade-offs to discuss

Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.

GeoHash polygon index vs raw point-in-polygon vs PostGIS

Raw PIP against every zone is simplest but O(zones) per request and too slow at 50K-100K reads/s. PostGIS with a spatial index works but adds DB load on the hottest read path. An in-memory GeoHash index (precompute which cells each polygon overlaps, look up by cell, then PIP only the few candidates) keeps the request in memory and sub-100ms. Cost: you must rebuild the index when zones change and pick a GeoHash resolution that balances candidate-set size against memory. Swiggy chose the in-memory GeoHash approach.

Just-in-time delayed assignment vs assign-on-order-placed

Assigning immediately is simpler and feels faster, but it pins a partner idle at the restaurant for the entire prep time — catastrophic when supply is the peak bottleneck. Delaying assignment until the order is near-ready, driven by predicted prep time, keeps partners productive but risks under-supply if the prediction is wrong and no partner is free when the food is ready. The delayed approach wins because partner idle time is the dominant cost lever at peak; you mitigate the risk by widening the candidate pool as readiness approaches.

MIP / global optimization per zone vs greedy nearest-partner

Greedy (assign each order to its nearest free partner) is trivial to build and reason about, but it's locally optimal and globally wasteful — across a zone with hundreds of simultaneous orders it leaves batching and total-cost gains unrealized. A Mixed-Integer Program over the (partner x batch) cost matrix finds a near-global optimum every few seconds. Cost: solver complexity, a hard latency budget, and the need to bound problem size per zone. At Swiggy's order density the MIP pays for itself; a tiny new market could start greedy.

Batch orders aggressively vs single-order trips

Batching roughly halves per-order delivery cost and sheds load during peak (fewer trips), which is why it's essential to the economics. But over-batching delays the first order in the batch and degrades freshness and CX. The resolution is to allow batching only when every order in the batch still satisfies its freshness/time-window constraint, and to cap batch size small (2-3). Efficiency is bounded by experience, not maximized blindly.

In-memory geo index for live location vs durable writes

Writing 35K-45K pings/s to the durable order store would crush it and almost nothing reads historical pings in real time. Keeping live positions only in an in-memory GeoHash-keyed store gives cheap proximity queries and cheap writes; a downsampled trace goes to the data lake for analytics. Cost: a node failure loses live positions, but partners re-ping within seconds, so the index self-heals — an acceptable trade for the throughput.

Strong consistency on orders/payments vs eventual everywhere

Discovery, listings, and tracking tolerate eventual consistency and stale caches fine. The order FSM and payment capture cannot — eventual consistency there means double charges, lost orders, or an order stuck between two states. So you split the system: eventually-consistent, cached, horizontally-scaled read planes around a small strongly-consistent transactional core (sharded SQL with atomic transitions and idempotency keys). Don't pay for strong consistency where you don't need it; never skip it on money and order state.

Surge to flatten peak demand vs fixed pricing

Fixed delivery fees are simpler and feel fairer to customers, but during the dinner cliff they leave demand unbounded while partner supply is fixed, producing long ETAs and failed assignments. A smoothed, clamped surge multiplier shaves demand at the peak's edges and pulls more partners online. Cost: customer friction and the risk of a runaway feedback loop, mitigated by smoothing over a window, capping the multiplier, and locking the price in at order time.

How Swiggy actually does it

Swiggy's engineering blog (Swiggy Bytes) documents most of this directly. Their serviceability platform really does use a GeoHash index over delivery-zone polygons with a point-in-polygon resolution step, computing actual road distance rather than straight-line. Dispatch is framed as a Mixed-Integer Program: a cost matrix matching delivery partners to batches, minimizing total wait/cost subject to capacity, uniqueness, and time-window constraints, re-solved every few seconds, with order batching modeled as a small Vehicle Routing Problem. The ETA system decomposes delivery time as Max(assignment_delay + first_mile, prep_time) + last_mile and predicts five legs (O2A, first-mile, kitchen wait, last-mile, order-to-reach) jointly via a multi-input/multi-output neural network that evolved from gradient-boosted trees; it optimizes for both MAE and the rate of jarring ETA 'bumps.' The backbone is Kafka, with transactional data in an Aurora/Postgres-class store and catalog/search in ElasticSearch. Scale figures cited (200K+ restaurants, 580+ cities, 390K+ delivery partners, dinner ~29% above lunch with ~215M dinner orders in 2024) are from Swiggy's own 2024 year-in-review and press. Order-per-second numbers here are estimates derived from public daily-order figures and the peak-concentration assumption — treat them as back-of-envelope, which is exactly what an interviewer wants you to show.

Sources

Lessons to study before this interview

If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.

Design a Notification System

capstone / capstone

Load Balancing

foundation / core fundamentals

Idempotency

foundation / core fundamentals

Distributed Locks

advanced / distributed systems core

Cache-Aside Pattern

foundation / caching strategies

High Availability

advanced / reliability resilience

Rate Limiting for Resilience

advanced / reliability resilience

Frequently asked questions

Practice with 766 system design lessons

Lifetime access for INR 299 or $5. Interactive diagrams, runnable code, quizzes, and 20 capstone projects including Design Swiggy.

Design Swiggy: System Design Interview Guide

Why this question is asked

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

Customer enters a delivery address; system returns only restaurants that can actually serve that address (serviceability)
Customer browses a menu, builds a cart, and places an order with online payment or cash on delivery
Restaurant receives the order on a partner tablet and accepts or rejects it within a short window
System assigns a delivery partner to pick up the order, batching nearby orders where it improves efficiency
Customer sees a live ETA and the delivery partner's location on a map once the order is picked up
Order moves through a strict state machine: placed, confirmed, food being prepared, ready, picked up, on the way, delivered (or canceled / refunded)
Delivery partner app shows the pickup, the drop, optimized route, and earnings for the trip
Customer can rate the restaurant and the delivery partner after delivery
Surge / dynamic delivery fee applied during peak demand when partner supply is tight

Non-functional requirements

Serviceability and restaurant-listing reads under 100 ms at p99 (this is the home-screen hot path)
Dispatch assignment decision within a few seconds of an order becoming ready-to-assign
Live tracking location updates every 4-5 seconds per active trip with sub-second delivery to the customer app
System sized for meal-peak load roughly 4-5x the daily average, concentrated in two 90-minute windows
Strong consistency on order state transitions and payment capture; no double charges, no lost orders
99.9%+ availability for order placement and tracking; degrade gracefully (e.g., wider ETA, fewer batches) rather than fail under peak
Geo-partitioned by city so a hot city does not starve others

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Daily orders

~6-7M/day

Peak orders per second

~600-700 OPS

Active delivery partners (peak concurrent)

~150K-200K

390,000+ total partners; peak concurrency around 40-50% during dinner. Each emits a location ping every 4-5 seconds while on an active trip or idle-but-online.

Location pings per second (peak)

~35K-45K/s

~180K online partners / 4.5 s ping interval. Written to an in-memory geo index, never to the durable order store at full fidelity.

Serviceability + listing reads

~50K-100K reads/s at peak

Restaurants / cities

200K+ restaurants, 580+ cities

Public 2024 figures. Drives the size of the serviceability polygon index and the per-city sharding strategy.

How Swiggy actually does it

Frequently asked questions

Design Swiggy: System Design Interview Guide

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Serviceability service

Catalog / listing + search

Order service + state machine

Restaurant partner channel

Dispatch / assignment engine

Geo / location index

ETA prediction service

Live tracking service

Surge / pricing service

Notification + event backbone

Data model

Deep dives

Serviceability: answering 'can this restaurant reach me' in under 100 ms

Just-in-time dispatch and the assignment optimization

Order batching and the Vehicle Routing Problem inside it

Decomposed ETA prediction (and why a single number is wrong)

The three-party order state machine and failure branches

Surviving the lunch/dinner peak (the part most candidates skip)

Trade-offs to discuss

GeoHash polygon index vs raw point-in-polygon vs PostGIS

Just-in-time delayed assignment vs assign-on-order-placed

MIP / global optimization per zone vs greedy nearest-partner

Batch orders aggressively vs single-order trips

In-memory geo index for live location vs durable writes

Strong consistency on orders/payments vs eventual everywhere

Surge to flatten peak demand vs fixed pricing

How Swiggy actually does it

Lessons to study before this interview

Frequently asked questions

How does Swiggy decide which restaurants to show for my address?

Why doesn't Swiggy assign a delivery partner the moment I place the order?

How does Swiggy match orders to delivery partners at peak?

How does order batching work and when does it kick in?

How does Swiggy calculate the ETA you see?

What happens if the restaurant rejects my order or no partner accepts?

How does Swiggy handle the lunch and dinner rush?

Why is the order and payment system strongly consistent but the rest isn't?

Practice with 766 system design lessons

Design Swiggy: System Design Interview Guide

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Serviceability service

Catalog / listing + search

Order service + state machine

Restaurant partner channel

Dispatch / assignment engine

Geo / location index

ETA prediction service

Live tracking service

Surge / pricing service

Notification + event backbone

Data model

Deep dives

Serviceability: answering 'can this restaurant reach me' in under 100 ms

Just-in-time dispatch and the assignment optimization

Order batching and the Vehicle Routing Problem inside it

Decomposed ETA prediction (and why a single number is wrong)

The three-party order state machine and failure branches

Surviving the lunch/dinner peak (the part most candidates skip)

Trade-offs to discuss

GeoHash polygon index vs raw point-in-polygon vs PostGIS

Just-in-time delayed assignment vs assign-on-order-placed

MIP / global optimization per zone vs greedy nearest-partner

Batch orders aggressively vs single-order trips

In-memory geo index for live location vs durable writes

Strong consistency on orders/payments vs eventual everywhere

Surge to flatten peak demand vs fixed pricing