Is this a video course?

No. This is an interactive, slide-based learning platform. Each lesson has rich text, animated diagrams, live code editors, and quizzes. You learn by reading, interacting, and doing, not by watching videos passively.

How long do I have access?

Forever. Both pricing tiers are one-time payments with lifetime access. This includes all current 766 lessons and any future content we add.

What level of experience do I need?

None. We start from absolute basics like 'What is latency?' and build up to distributed consensus protocols. The Foundation level assumes zero prior knowledge of system design.

How much does the system design course cost?

6.99 US dollars for lifetime access globally, or 399 Indian rupees for lifetime access in India. One-time payment, no subscription, no hidden fees. 11 lessons are free with no signup required.

What technologies are covered?

Everything from DNS and load balancers to Kubernetes, Kafka, distributed databases, consensus protocols, stream processing, security architecture, and observability. We cover principles and real-world implementations used at Netflix, Google, Amazon, Uber, Stripe, and more.

Is this useful for system design interview preparation?

Yes. The lessons are structured around the exact topics asked in system design interviews at FAANG and top-tier companies. Interactive diagrams help you practice whiteboard-style explanations. Covers everything from URL shortener design to distributed payment systems.

How is this different from ByteByteGo or Educative?

766 interactive lessons (4x more than most competitors), 16 different diagram types that build step by step, real production examples from Netflix, Google, Amazon, Uber, and Stripe, and lifetime access for a one-time payment of 7 dollars instead of annual subscriptions costing 100 to 200 dollars per year.

Why can't Dream11 just use normal autoscaling for the match spike?

Because reactive autoscaling adds servers only after metrics cross a threshold, and provisioning takes minutes, by which point the spike has already peaked. Dream11 built an in-house platform, Scaler, that predicts concurrency ahead of time using a forecasting model over 200-plus variables and pre-provisions capacity in phases before the match, using pre-baked images and a mix of instance types. That cut scale-out time from hours to about five minutes.

How does Dream11 make sure a contest is not oversold when millions join at once?

By keeping contests and payments in a store with real transactional guarantees, MySQL, and treating a join as an atomic operation that checks and increments the filled-spots count within a transaction, so two joins can never both take the last spot. The entry-fee charge is idempotent, so a retry during the surge does not double-charge. The high-volume, less-critical traffic is served by a fast layer so it does not all hit the transactional database.

How does Dream11 score millions of teams live during a match?

It treats scoring as a fan-out handled on a fixed cycle. A single real event, like a wicket, changes points for every team that picked those players, across thousands of contests. Dream11 described a Spark-based engine, Aryabhata, that recomputes points and ranks across thousands of contests on a short cycle, in its published design within a 60-second service level over tens of millions of records, and serves the leaderboards from cache. Recomputing on a cycle turns an impossible per-event fan-out into a bounded, repeatable batch.

What happens when an umpire reverses a decision after points were awarded?

The scoring pipeline is built to handle it. Dream11 designed scoring as an immutable, append-only pipeline using multi-version concurrency control and snapshot isolation. Every scoring event is an insert, not an in-place update, so a reversal is handled by appending a correcting event and recomputing, rather than mutating shared state under load. That keeps scoring safe and reconstructable while the match is still in flux.

How is designing Dream11 different from a normal high-traffic app?

The spike. A normal app sees load rise and fall gradually, but Dream11 sees a massive, predictable surge in the final minutes before a match, triggered by the toss, that reactive scaling cannot absorb. That forces prediction-based pre-scaling, a contention-safe write path for fixed-capacity contests, and a live-scoring fan-out to millions of leaderboards, none of which a steady-state design needs. Note that Dream11 paused paid contests in August 2025, so this describes how the platform was engineered during its paid-contest era.

System Design Interview Guide

Dream11 System Design Interview: Fantasy Sports at Scale

Just before a big cricket match, millions of Dream11 users rush to lock their fantasy teams in the final minutes, and one event, the toss, can trigger almost all of them at once. Dream11 reported handling more than 5.5 million concurrent users during the IPL 2020 final and around 100 million requests per minute at its edge, on a base of over 200 million registered users.

Designing Dream11 is the extreme-spike problem. The load is not smooth: for a popular match, a large share of the day's users create or edit their fantasy teams in the last few minutes before the deadline, and the toss right before the match makes everyone act at once. On top of that, once the match starts, the platform has to score every user's team from live ball-by-ball data and update the ranks across thousands of contests for millions of players, continuously. The interview is about absorbing a predictable but enormous surge, keeping contest joins and money correct under that load, and fanning live scores out to huge leaderboards. Note that Dream11 paused paid contests in August 2025 after a change in Indian law, so this describes how the platform was engineered during its paid-contest era.

Asked at: Commonly asked at Dream11, other gaming and fantasy-sports companies, and ticketing or flash-sale businesses, and the general form, meaning design a system for a massive predictable traffic spike or a live leaderboard, shows up at Amazon, Google, and most product companies for SDE2 and SDE3 rounds. It is a favorite because the spike is both extreme and predictable, which turns the interview into a real capacity and pre-scaling conversation rather than a generic one.

Why this question is asked

Most systems are designed for load that rises and falls gradually. Dream11 is the opposite: the demand is spiky, correlated, and tied to an external event no one controls. Interviewers use it to check three things. First, can you handle a surge where millions of users act in the same few minutes before a deadline, which breaks reactive autoscaling because servers cannot be provisioned fast enough once the spike has started. Second, can you keep the write path correct under that load, so a contest with a fixed number of spots is never oversold and money is never lost. Third, can you design the live-scoring fan-out, where a single ball in the real match changes the points and rank of millions of fantasy teams across thousands of contests at once. It separates candidates who can only design steady-state systems from those who can reason about a predictable megaspike and a real-time scoring pipeline.

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

User browses upcoming matches, picks a fantasy team of players within a budget and role rules, and edits it until the deadline
User joins one or more contests for a match, paying an entry fee, with each contest holding a fixed number of spots and a defined prize structure
Team edits and contest joins are locked at the match deadline, after which no changes are allowed
As the real match plays, each fantasy team earns points from live ball-by-ball events
Live leaderboards show every user their rank within each contest, updating through the match
At match end, ranks are finalized and winnings are credited to user wallets
User manages a wallet: deposits, entry fees, winnings, and withdrawals
Social features such as following other users and viewing their activity

Non-functional requirements

Absorb a massive, predictable spike in the final minutes before a match deadline, when a large share of users join or edit teams at once
The toss just before the match can trigger a correlated surge, so capacity must be ready before it, not provisioned after
Contest joins must be correct under extreme concurrency: a fixed-size contest is never oversold, and entry fees and winnings never drift
Live scoring must fan out to millions of users across thousands of contests with fresh ranks through the match
Very high availability during matches, since downtime during a big game is extremely costly
Read-heavy screens such as leaderboards and team views must stay fast during the surge
Cost efficiency, since the peak capacity needed for a big match is many times the everyday baseline

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Registered users

200M+ (2023)

Dream11 reported crossing 200 million registered users in October 2023, up from 45 million in 2018. This is the base from which a match-day spike draws.

Peak concurrent users

5.5M (IPL 2020 final), ~10.5M later

Dream11 and AWS reported more than 5.5 million concurrent users during the IPL 2020 final, with later peaks reported around 10.5 million. Concurrency, not total users, is what the match-day design is sized for.

Peak edge requests

~100M requests/minute

Dream11 reported around 100 million requests per minute at its edge services during peak match traffic. This is the request rate the front door and caches must absorb.

Compute at peak

30,000+ resources, 700 ASGs

Dream11 reported using more than 30,000 compute resources at peak, managed as roughly 700 auto-scaling groups behind about 200 load balancers, across 100-plus sporting events a day, at 99.99 percent uptime.

Real-time data store throughput

~1.8M ops/s, sub-15ms

Dream11's Aerospike layer was reported to serve around 1.8 million operations per second at peak with sub-15-millisecond latency, handling about 35 terabytes of data a day. It is the hot store behind the match-day load.

Cost of downtime

~$1M per minute

A minute of downtime during peak was reported to cost Dream11 around one million dollars, which is why the design targets 99.99 percent uptime and pre-provisions capacity.

High-level architecture

Design Dream11 around one fact: the load is a predictable megaspike, not a steady stream. Everything follows from that. All of this runs on AWS, and the numbers below are what Dream11 reported during its paid-contest era. The first problem is capacity for the spike. For a popular match, a large share of users create or edit teams and join contests in the final minutes, and the toss just before the match can make them all act at once. Reactive autoscaling does not work here, because by the time metrics show the surge, it is too late to boot servers. So Dream11 built an in-house scaling platform, Scaler, that pre-provisions capacity before the match using prediction. A forecasting model based on XGBoost iterates over 200-plus variables, such as the number and popularity of matches, active users, and transaction sizes, to forecast concurrency, and Scaler scales capacity out in phases against slabs of expected concurrency. It uses fully-baked machine images, a diverse mix of instance types including spot instances, and weighted DNS across load-balancer shards to add capacity fast, which cut scale-out time from hours to about five minutes and let the platform forecast roughly 10 million active users about two hours before a major match and be ready for it. The second problem is the write path: joining contests and locking teams. A contest has a fixed number of spots and an entry fee, so under a surge you must never oversell it or lose money. This is a strongly consistent, contention-heavy path. Transactional data such as registrations, team selections, and payments is held in MySQL with the ACID guarantees that money and fixed-capacity contests need, while the fast, high-volume reads and writes around the match are served by Aerospike, which Dream11 adopted as a combined persistence and caching layer at around 1.8 million operations per second with sub-15-millisecond latency. A wallet ledger records entry fees and winnings. The third problem is live scoring. Once the match starts, a stream of ball-by-ball events arrives, and each event can change the points and rank of millions of fantasy teams across thousands of contests. Dream11 described a leaderboard engine, called Aryabhata, built on Apache Spark that recomputes points, groups by contest, and ranks users, generating leaderboards across thousands of contests on a tight cycle, in its published design processing tens of millions of records within a 60-second service level. The scoring is designed as an immutable, append-only pipeline using multi-version concurrency control and snapshot isolation, so events can be processed concurrently and rolled back cleanly if a scoring decision is reversed. Cassandra was the store for this leaderboard system in the published design, with Redis caching in front, and a separate social graph, the follow network, was built on Amazon Neptune with Redis sorted sets for fast follower lookups.

In a real interview, sketch this on the whiteboard before diving into any single box.

Core components

Walk through each service. The interviewer wants to hear what each one owns, not just the names.

Scaler: prediction-based pre-scaling

Dream11's in-house platform that provisions capacity before a match rather than reacting to it. A forecasting model based on XGBoost, over 200-plus variables, predicts concurrency, and Scaler scales out in phases against concurrency slabs using pre-baked images, mixed and spot instance types, and weighted DNS across load-balancer shards. It cut scale-out time from hours to about five minutes.

Edge and API layer

The front door that absorbed around 100 million requests per minute at peak, fronted by roughly 200 load balancers and about 700 auto-scaling groups. It handles authentication, routing, and read-heavy traffic, and is the layer Scaler pre-provisions most aggressively before a match.

Contest and team service (MySQL)

The strongly consistent write path for joining contests and locking teams. Held in MySQL for ACID guarantees, because a fixed-size contest must never be oversold and entry fees must be exact. It enforces the deadline lock after which no team edits are allowed.

Aerospike real-time store

The combined persistence and caching layer for the hot match-day workload, adopted in place of an earlier relational persistence setup. Reported at around 1.8 million operations per second with sub-15-millisecond latency and about 35 terabytes of data a day, it serves the high-volume reads and writes that a big match generates.

Live scoring engine (Aryabhata on Spark)

The pipeline that turns live ball-by-ball match events into fantasy points and ranks. Built on Apache Spark, it recomputes points across thousands of contests on a tight cycle, in Dream11's published design processing tens of millions of records within a 60-second service level, and fans the results out to leaderboards for millions of users.

Leaderboard store and cache

In the published leaderboard design, Apache Cassandra was the primary store, chosen for high write throughput and tunable consistency, with Redis caching in front so the constant leaderboard reads during a match stay fast and do not overload the store.

Wallet and payments

The ledger that tracks deposits, entry fees, winnings, and withdrawals per user. Strongly consistent, because it is real money, and it settles winnings after a match once ranks are final.

Social graph (Neptune + Redis)

The follow network, built on Amazon Neptune as a graph database with Redis sorted sets in front for fast follower lists and counts. It is kept separate from the contest and scoring path so social traffic does not compete with match-critical work.

Data model

Pick the right store per table. Justify each choice with the access pattern, not by reflex.

users

user_id (PK)namewallet_balance_paisekyc_statuscreated_at

The player account and wallet balance. Transactional data held in MySQL. The base from which a match-day spike draws.

matches

match_id (PK)sportteamsstart_timedeadlinestate

The real-world match. The deadline is when team edits and contest joins lock, and start_time is close to the toss that triggers the final surge. State drives whether joins are open, locked, live, or settled.

contests

contest_id (PK)match_identry_fee_paisetotal_spotsfilled_spotsprize_structurestate

A contest for a match with a fixed number of spots. filled_spots must never exceed total_spots even under a surge of joins, so this is the fixed-capacity, contention-heavy write. Held in MySQL for ACID.

fantasy_teams

team_id (PK)user_idmatch_idplayer_ids[]captain_idvice_captain_idlocked_at

The eleven players a user picked for a match, within budget and role rules. Editable until the deadline, then locked. The captain and vice-captain earn multiplied points.

contest_entries

entry_id (PK)contest_id (FK)user_idteam_idjoined_at

A user's team entered into a specific contest. Creating this row is the contest-join write that must respect the contest's fixed capacity and charge the entry fee exactly once.

player_scores

score_id (PK)match_idplayer_ideventpoints_deltats

Append-only stream of scoring events derived from live ball-by-ball data. Immutable, so events can be processed concurrently and rolled back cleanly. This is the input to the scoring engine.

leaderboards

contest_iduser_idpointsrankupdated_at

The ranked standing per contest, recomputed on a tight cycle during the match and served from cache. The highest-fan-out data in the system, since one scoring event can change ranks for millions of entries.

wallet_ledger

entry_id (PK)user_iddirection (credit|debit)amount_paisereasoncreated_at

Append-only record of money movement: deposits, entry fees, winnings, withdrawals. Strong consistency required. The balance is always reconstructable from the ledger.

Deep dives

These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.

The pre-match spike and why reactive autoscaling fails

The defining challenge is that demand is not smooth. For a popular match, a large share of users create or edit teams and join contests in the final minutes before the deadline, and the toss just before the match is an external trigger that makes them act at nearly the same moment. Reactive autoscaling, which adds servers when metrics cross a threshold, cannot keep up, because provisioning takes minutes and the spike has already peaked by then. Dream11's answer was to pre-provision using prediction. Its Scaler platform runs a forecasting model based on XGBoost over 200-plus variables, such as how many matches are on, how popular they are, active users, and transaction sizes, to forecast concurrency ahead of time, then scales capacity out in phases against slabs of expected concurrency. It uses fully-baked machine images so new instances are ready instantly, a diverse mix of instance types including spot instances for cost, and weighted DNS across load-balancer shards to spread the new capacity. The reported result was scale-out time cut from hours to about five minutes, and forecasting roughly 10 million active users about two hours before a major match. The interview point is that a predictable spike should be met with prediction and pre-scaling, not reaction.

Contest joins under extreme concurrency

A contest has a fixed number of spots and an entry fee, so when millions try to join in the final minutes, the write path must stay correct: the contest cannot be oversold, and each entry fee must be charged exactly once. This is a fixed-capacity, high-contention problem, the same shape as a flash sale. The correct approach keeps this data in a store with real transactional guarantees, which is why Dream11 used MySQL for registrations, team selection, and payments, and treats the join as an atomic operation that checks and increments the filled-spots count within a transaction, or uses an equivalent guarded counter, so two joins cannot both take the last spot. The entry-fee debit is idempotent so a retry during the surge does not double-charge. Around this strongly consistent core, the high-volume reads and less-critical writes are served by the fast Aerospike layer, so the surge does not all land on the transactional database.

Live scoring and the leaderboard fan-out

Once the match starts, scoring is a fan-out problem. A single real-world event, a wicket or a boundary, changes the points for every fantasy team that picked those players, which changes ranks across thousands of contests for millions of users. Dream11 described a leaderboard engine, Aryabhata, built on Apache Spark, that takes the stream of scoring events, recomputes points, groups entries by contest, and ranks them, generating leaderboards across thousands of contests on a tight cycle, in its published design within a 60-second service level over tens of millions of records. Rather than update every user the instant a ball is bowled, it recomputes and publishes on a short, fixed cycle, which turns an impossible per-event fan-out into a bounded, repeatable batch. The results are cached, in the published design on Cassandra with Redis in front, so the constant leaderboard reads during the match stay fast.

The immutable, append-only scoring pipeline

Live sports scoring has a hard requirement most systems do not: decisions can be reversed. A third umpire overturns a call, a catch is deemed not out, and the points already awarded must be corrected. Dream11 designed the scoring as an immutable, append-only pipeline using multi-version concurrency control and snapshot isolation. Every scoring event is an insert, never an in-place update, so the state at any point is reconstructable, events can be processed concurrently without corrupting each other, and a reversal is handled by appending a correcting event and recomputing, rather than trying to mutate shared state under load. This is what lets the platform score safely and quickly while the match is still in flux.

Data stores: matching each workload to the right engine

Dream11's design uses different stores for different jobs rather than one database for everything. MySQL holds the transactional data, registrations, team selection, contests, and payments, where ACID guarantees matter because money and fixed-capacity contests cannot drift. Aerospike serves the hot match-day workload as a combined persistence and caching layer, reported at around 1.8 million operations per second with sub-15-millisecond latency and 35 terabytes a day, adopted in place of an earlier relational persistence setup that could not keep up. In the published leaderboard design, Cassandra stored the leaderboard data for its high write throughput and tunable consistency, with Redis caching the hot reads. The social follow network is built separately on Amazon Neptune, a graph database, with Redis sorted sets for fast follower counts. The lesson is to pick the store per access pattern: strong consistency for money, a fast key-value layer for the match-day hot path, a write-optimized store for leaderboards, and a graph store for the social network.

Availability, cost, and the price of a bad minute

During a big match, downtime is extraordinarily expensive: Dream11 reported that a minute of downtime at peak cost around one million dollars, which is why the design targets 99.99 percent uptime and pre-provisions capacity rather than risk running short. Cost is the other side of the same coin, because the capacity needed for a big match is many times the everyday baseline, and paying for that around the clock would be wasteful. Dream11 managed this with the mix of spot and on-demand instances in Scaler and by scaling capacity in and out around each match, and separately cut compute cost by around 42 percent by migrating to more efficient instance types. The interview framing is that extreme reliability during the spike and cost efficiency the rest of the time are both requirements, and pre-scaling with a diverse instance mix is how you get both.

Trade-offs to discuss

Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.

Prediction-based pre-scaling versus reactive autoscaling

Reactive autoscaling is standard and simple, but it provisions too slowly for a spike that peaks in minutes and is triggered by an external event like the toss. Predicting concurrency ahead of time and pre-provisioning against it, as Dream11 did with Scaler, means capacity is ready before the surge. The cost is the complexity of building and trusting a forecasting model and paying for capacity slightly ahead of need, accepted because being caught short during a big match is far more expensive.

MySQL for contests and money versus a single scalable NoSQL store

A NoSQL store scales writes easily, but a fixed-capacity contest and a wallet cannot tolerate the weak guarantees, since overselling spots or losing entry fees is unacceptable. Dream11 kept this data in MySQL for ACID guarantees, and offloaded the high-volume, less-critical match-day traffic to Aerospike. The cost is running more than one kind of store and deciding what belongs where, accepted because correctness on money and capacity is non-negotiable.

Recomputing leaderboards on a fixed cycle versus updating per event

Updating every affected user the instant a ball is bowled would give the freshest ranks but is an impossible fan-out at this scale, since one event touches millions of entries across thousands of contests. Recomputing on a short fixed cycle, as Dream11 did within a 60-second service level, bounds the work into a repeatable batch that finishes predictably. The cost is that ranks can be up to a cycle stale, which is an acceptable trade for being able to serve them to everyone at all.

Immutable append-only scoring versus mutable in-place updates

Updating scores in place is simpler, but live sports decisions get reversed, and mutating shared score state under heavy concurrent processing is error-prone. An immutable, append-only pipeline with snapshot isolation lets events be processed concurrently and corrections be handled by appending and recomputing, so the state is always reconstructable and safe. The cost is more storage and a recompute step, accepted because correctness under reversal and concurrency is essential.

Aerospike as a combined persistence and cache versus a relational persistence layer

A relational store as the persistence layer is familiar, but it could not keep up with the match-day operation rate. Aerospike as a combined persistence and caching layer served around 1.8 million operations per second at sub-15-millisecond latency, removing a separate cache tier for that workload. The cost is operating a specialized store and keeping the data model suited to it, accepted because the match-day throughput demanded it.

Spot and mixed instances versus all on-demand capacity

Running all peak capacity on on-demand instances is simplest and most reliable but very expensive for capacity used only around matches. Using a diverse mix that includes spot instances, as Scaler does, cuts cost sharply. The cost is that spot capacity can be reclaimed, so the design must tolerate losing some instances and diversify across types and pools, which Dream11 accepted to make peak capacity affordable.

How Dream11 actually does it

Most of this comes from Dream11's own engineering blog and AWS case studies, and describes the platform during its paid-contest era. Dream11 reported crossing 200 million registered users in October 2023, handling more than 5.5 million concurrent users during the IPL 2020 final, with later peaks reported around 10.5 million, and around 100 million requests per minute at its edge. It described its in-house Scaler platform using a forecasting model based on XGBoost over 200-plus variables to pre-provision capacity in phases against concurrency slabs, using pre-baked images, mixed and spot instances, and weighted DNS, cutting scale-out time from hours to about five minutes and forecasting roughly 10 million active users about two hours before a major match, with more than 30,000 compute resources, about 700 auto-scaling groups, and around 200 load balancers at peak, at 99.99 percent uptime across 100-plus events a day. Its published leaderboard engine, Aryabhata, used Apache Spark to recompute points and ranks across thousands of contests within a 60-second service level over tens of millions of records, with Cassandra as the store and Redis caching, and an immutable append-only design using multi-version concurrency control and snapshot isolation. It reported adopting Aerospike as a combined persistence and caching layer at around 1.8 million operations per second with sub-15-millisecond latency and 35 terabytes a day, using MySQL for transactional data, running a social follow network on Amazon Neptune with Redis, and cutting compute cost by about 42 percent through an instance migration. All of it runs on AWS. Three accuracy notes for the interview. First, the leaderboard figures are from Dream11's earlier published design and the platform will have evolved, so present them as how Dream11 described its live-scoring engine. Second, the scaling model is XGBoost, not a neural network, and some widely repeated figures such as a fixed joins-per-second number are not confirmed by primary sources, so avoid them. Third, Dream11 paused paid contests in August 2025 after India's online gaming law changed, so describe the paid-contest scale in past terms.

Sources

Lessons to study before this interview

If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.

Cache Stampede Prevention

foundation / caching strategies

Rate Limiting for Resilience

advanced / reliability resilience

Load Balancing

foundation / core fundamentals

Cache-Aside Pattern

foundation / caching strategies

Message Queues

intermediate / messaging event systems

Database Sharding

foundation / database fundamentals

High Availability

advanced / reliability resilience

Frequently asked questions

Practice with 766 system design lessons

Lifetime access for INR 399 or $6.99. Interactive diagrams, runnable code, quizzes, and 20 capstone projects including Design Dream11.

Dream11 System Design Interview: Fantasy Sports at Scale

Why this question is asked

Requirements

Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.

Functional requirements

User browses upcoming matches, picks a fantasy team of players within a budget and role rules, and edits it until the deadline
User joins one or more contests for a match, paying an entry fee, with each contest holding a fixed number of spots and a defined prize structure
Team edits and contest joins are locked at the match deadline, after which no changes are allowed
As the real match plays, each fantasy team earns points from live ball-by-ball events
Live leaderboards show every user their rank within each contest, updating through the match
At match end, ranks are finalized and winnings are credited to user wallets
User manages a wallet: deposits, entry fees, winnings, and withdrawals
Social features such as following other users and viewing their activity

Non-functional requirements

Absorb a massive, predictable spike in the final minutes before a match deadline, when a large share of users join or edit teams at once
The toss just before the match can trigger a correlated surge, so capacity must be ready before it, not provisioned after
Contest joins must be correct under extreme concurrency: a fixed-size contest is never oversold, and entry fees and winnings never drift
Live scoring must fan out to millions of users across thousands of contests with fresh ranks through the match
Very high availability during matches, since downtime during a big game is extremely costly
Read-heavy screens such as leaderboards and team views must stay fast during the surge
Cost efficiency, since the peak capacity needed for a big match is many times the everyday baseline

Back-of-envelope scale estimates

Show your math. Pulling numbers from thin air signals you have not thought about the load.

Registered users

200M+ (2023)

Dream11 reported crossing 200 million registered users in October 2023, up from 45 million in 2018. This is the base from which a match-day spike draws.

Peak concurrent users

5.5M (IPL 2020 final), ~10.5M later

Peak edge requests

~100M requests/minute

Dream11 reported around 100 million requests per minute at its edge services during peak match traffic. This is the request rate the front door and caches must absorb.

Compute at peak

30,000+ resources, 700 ASGs

Real-time data store throughput

~1.8M ops/s, sub-15ms

Cost of downtime

~$1M per minute

A minute of downtime during peak was reported to cost Dream11 around one million dollars, which is why the design targets 99.99 percent uptime and pre-provisions capacity.

How Dream11 actually does it

Frequently asked questions

Dream11 System Design Interview: Fantasy Sports at Scale

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Scaler: prediction-based pre-scaling

Edge and API layer

Contest and team service (MySQL)

Aerospike real-time store

Live scoring engine (Aryabhata on Spark)

Leaderboard store and cache

Wallet and payments

Social graph (Neptune + Redis)

Data model

Deep dives

The pre-match spike and why reactive autoscaling fails

Contest joins under extreme concurrency

Live scoring and the leaderboard fan-out

The immutable, append-only scoring pipeline

Data stores: matching each workload to the right engine

Availability, cost, and the price of a bad minute

Trade-offs to discuss

Prediction-based pre-scaling versus reactive autoscaling

MySQL for contests and money versus a single scalable NoSQL store

Recomputing leaderboards on a fixed cycle versus updating per event

Immutable append-only scoring versus mutable in-place updates

Aerospike as a combined persistence and cache versus a relational persistence layer

Spot and mixed instances versus all on-demand capacity

How Dream11 actually does it

Lessons to study before this interview

Frequently asked questions

Why is the traffic on Dream11 so spiky?

Why can't Dream11 just use normal autoscaling for the match spike?

How does Dream11 make sure a contest is not oversold when millions join at once?

How does Dream11 score millions of teams live during a match?

What happens when an umpire reverses a decision after points were awarded?

Which databases does Dream11 use, and why more than one?

How is designing Dream11 different from a normal high-traffic app?

Practice with 766 system design lessons

Dream11 System Design Interview: Fantasy Sports at Scale

Why this question is asked

Requirements

Functional requirements

Non-functional requirements

Back-of-envelope scale estimates

High-level architecture

Core components

Scaler: prediction-based pre-scaling

Edge and API layer

Contest and team service (MySQL)

Aerospike real-time store

Live scoring engine (Aryabhata on Spark)

Leaderboard store and cache

Wallet and payments

Social graph (Neptune + Redis)

Data model

Deep dives

The pre-match spike and why reactive autoscaling fails

Contest joins under extreme concurrency

Live scoring and the leaderboard fan-out

The immutable, append-only scoring pipeline

Data stores: matching each workload to the right engine

Availability, cost, and the price of a bad minute

Trade-offs to discuss

Prediction-based pre-scaling versus reactive autoscaling

MySQL for contests and money versus a single scalable NoSQL store

Recomputing leaderboards on a fixed cycle versus updating per event

Immutable append-only scoring versus mutable in-place updates

Aerospike as a combined persistence and cache versus a relational persistence layer

Spot and mixed instances versus all on-demand capacity

How Dream11 actually does it

Lessons to study before this interview

Frequently asked questions

Why is the traffic on Dream11 so spiky?

Why can't Dream11 just use normal autoscaling for the match spike?

How does Dream11 make sure a contest is not oversold when millions join at once?

How does Dream11 score millions of teams live during a match?