Rapido System Design Interview: Bike-Taxi Dispatch at Scale
Rapido grew to nearly half of India's ride-hailing on the back of two-wheeler taxis, reaching about 50 million monthly active users and 2 million earning captains. It also flipped the money model: instead of taking a cut of every ride, it charges captains a flat subscription and lets them keep the whole fare, which brought in 275 crore rupees of subscription income in a year.
Designing Rapido is a ride-hailing problem with two things that make it distinct. First, the core vehicle is a two-wheeler, and Rapido has published how it matches riders to nearby bikes, moving from a simple radius-and-straight-line approach to hex-grid geography with learned driving times. Second, the money model is different: rather than taking a commission on each ride, Rapido charges captains a flat subscription and lets them keep the full fare, which changes the settlement path from a per-ride commission ledger to a subscription-entitlement check. This walkthrough covers the two-wheeler dispatch that Rapido has published, the subscription money model and its system implications, and the data platform behind it, and is honest about which parts are published versus reasoned.
Asked at: Commonly asked at Rapido, Ola, Uber, and mobility teams, and the general forms, meaning design a ride-hailing app, a real-time matching system, or a subscription-billing system, show up at most product companies for SDE2 and SDE3 rounds. Rapido is a good question because it foregrounds two-wheeler matching and a genuinely different money model, so it does not collapse into a generic ride-hailing answer.
Why this question is asked
Ride-hailing tests real-time geospatial matching, but Rapido adds two distinctive angles that interviewers like. The dispatch problem is one Rapido has actually written about, including why a straight-line distance is misleading and how it moved to hex-grid geography and learned driving times, which makes for a concrete matching discussion. And the money model is a real design difference: charging captains a subscription instead of a per-ride commission replaces the commission calculation and ledger with an entitlement check and full-fare settlement, which is a different billing and money-movement design. Interviewers use Rapido to see whether you can design the matching core, reason about why driving time beats straight-line distance, and think through what a subscription model changes in the settlement path, rather than reciting a generic dispatch diagram.
Requirements
Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.
Functional requirements
- Rider requests a ride and is matched to a nearby captain, primarily on a two-wheeler, and also auto or cab
- Rider tracks the assigned captain approaching, then tracks the trip to the destination
- Fare is shown up front and settled at the end, by UPI, wallet, or cash
- Captains can subscribe to a plan that lets them take rides without a per-ride commission and keep the full fare
- The system verifies a captain's subscription entitlement rather than computing a commission on each ride, where the subscription model applies
- Support multiple vehicle types (bike, auto, cab) and also delivery, on one platform
- Show estimated time of arrival and trip time, learned from real ride data
- Handle demand across large cities and smaller towns
Non-functional requirements
- Match a request to a captain within seconds, with nearby-captain queries answered in low tens of milliseconds
- Accurate estimated arrival and trip times based on real driving conditions, not straight-line distance
- Live tracking updated every few seconds during a trip
- Correct money settlement: full fare to the captain under the subscription model, or the right split under commission
- High availability on the request, tracking, and settlement paths
- Scale to millions of rides a day across 100-plus cities
- Cost efficiency, since a mass-market, low-fare model depends on low cost per ride
Back-of-envelope scale estimates
Show your math. Pulling numbers from thin air signals you have not thought about the load.
Monthly active users
~50M (2025)
Rapido reported about 50 million monthly active users in 2025, ahead of some rivals. This is the demand side of the matching problem.
Active captains
~2M earners
Rapido reported about 2 million active earning captains, with roughly 160,000 new drivers onboarded a month. Captains are the supply side, and the subscription model is aimed at them.
Rides per day
~1.5M/day (2022, published)
Rapido's own data-platform post cited about 1.5 million rides a day across 100-plus cities as of late 2022. It has grown since, but this is the strongest figure Rapido itself published; treat later, larger daily-ride numbers from secondary blogs as unverified.
Subscription income
~275 crore rupees/year (~14x YoY)
Rapido reported about 275 crore rupees of subscription income in FY25, roughly 14 times the year before. This is the clearest evidence that the subscription money model is real and material, not a pilot.
Market position
~50% ride-hailing, bike-taxi leader
Rapido reported about half of India's ride-hailing overall and leads bike taxis, having grown fast in autos and cabs too. Its two-wheeler core is what set it apart from cab-first rivals.
Data platform scale
15+ PB/month, ~1,000 pipelines
Rapido's data-platform post reported processing more than 15 petabytes a month across about 1,000 pipelines and 3,000 datasets, with 12,000 to 15,000 queries a day. This is the analytics backbone behind matching and pricing.
High-level architecture
Design Rapido around two-wheeler matching and a different money model. The dispatch evolution and the data platform are things Rapido has published, while the subscription entitlement system is reasoned from the publicly stated model, and the surge and live-location internals are the general ride-hailing pattern. The matching core answers which captain should take a request. When a rider requests a ride, the system finds nearby captains and picks a good one. Rapido has described how this evolved. The original approach was a simple radial system: draw a circle of a couple of kilometers around the rider, find the captains inside, and order them by straight-line distance. The problem it hit is that straight-line distance misleads, because a captain who is close as the crow flies may be far by road across a ring road or a railway crossing. So Rapido moved to partitioning geography with a hexagonal grid, using the H3 system at a fixed resolution, and to learning driving times from historical rides rather than trusting straight-line distance. It predicts two things: the time for a captain to reach the rider, and the time for the trip itself, both bucketed by time of day and day of week, so matching and estimates reflect how the city actually moves. A two-week A/B test in one city showed this cut the estimated arrival time meaningfully. Live captain positions are kept in an in-memory geo index so proximity queries are fast, which is the standard ride-hailing approach. The money model is where Rapido is distinctive. Ola and Uber take a commission on each ride, which means computing a split and keeping a commission ledger per trip. Rapido shifted much of its supply to a subscription model: a captain pays a flat subscription, and then keeps the full fare on rides, with no per-ride commission. Rapido has publicly framed this and reported large subscription income from it. The exact entitlement system is not published, so reasoning from the stated model, the settlement path changes shape: instead of calculating and recording a commission on every trip, the system checks whether the captain holds an active subscription and, if so, settles the whole fare to them, so the recurring subscription billing becomes the revenue event rather than the per-ride cut. This is a genuinely different billing and money-movement design, and it is worth being clear that the concept is well established while the internals are inferred. Behind both sits a data platform Rapido has published. Kafka ingests millions of events a day, feeding a layered lake, raw, then cleaned and enriched, then curated, queried through Trino across several clusters with Spark for processing, on Google Cloud using low-cost spot machines for efficiency. This platform is what powers the learned driving times, pricing, and analytics.
In a real interview, sketch this on the whiteboard before diving into any single box.
Core components
Walk through each service. The interviewer wants to hear what each one owns, not just the names.
Ride request and matching service
Takes a rider's request and selects a good nearby captain, primarily a two-wheeler. Rapido has published how this moved from a simple radius-and-straight-line approach to hex-grid geography with learned driving times, so matching reflects real road travel rather than crow-flying distance.
Geospatial index and driving-time models
Partitions geography with the H3 hexagonal grid at a fixed resolution and keeps live captain positions in memory for fast proximity queries. Machine-learned models predict the captain-to-rider time and the trip time from historical rides, bucketed by time of day and day of week, replacing straight-line distance.
Trip service and tracking
Owns the trip lifecycle from request to assignment to completion, and streams the captain's live position to the rider every few seconds. The live-location streaming follows the general ride-hailing pattern rather than a Rapido-published internal.
Subscription and entitlement service
The distinctive money component. Captains buy a flat subscription, and the system checks that a captain holds an active subscription rather than computing a commission per ride. Rapido has publicly stated the model and reported large subscription income; the internal entitlement design is reasoned from that, not published in detail.
Fare and settlement
Prices the ride and settles money at trip end. Under the subscription model the full fare goes to the captain with no per-ride commission, so the recurring subscription is the revenue event; where commission still applies, a split is computed. Payments across UPI, wallet, and cash are handled idempotently.
Multi-modal platform
One platform serving bike, auto, and cab rides, plus delivery, which grew into Rapido's largest single revenue stream. Different vehicle types can carry different money models, so the platform supports both subscription and commission settlement behind a shared matching core.
Data platform
The analytics and machine-learning backbone. Kafka ingests millions of events a day into a layered lake, queried with Trino across several clusters and processed with Spark, on Google Cloud with low-cost spot machines. It processes more than 15 petabytes a month and powers the learned driving times and pricing.
Data model
Pick the right store per table. Justify each choice with the access pattern, not by reflex.
captainscaptain_id (PK)vehicle_type (bike|auto|cab)is_onlinecurrent_city_idsubscription_stateratingThe captain and current state. subscription_state records whether they hold an active subscription, which decides settlement. Live position is not here; it lives in the in-memory geo index.
captain_locationscaptain_id (PK)latlngh3_cellupdated_atIn-memory only, keyed by H3 hex cell so proximity queries are cheap. Overwritten every few seconds. The highest-volume data in the ride path, never durably persisted at full fidelity.
ridesride_id (PK)rider_idcaptain_idvehicle_typepickupdropstatefare_paisesettlement_mode (subscription|commission)requested_atThe trip record. settlement_mode records how money is handled: full fare to the captain under subscription, or a split under commission. Strong consistency on the money and the trip state.
subscriptionssubscription_id (PK)captain_idplanvalid_fromvalid_tostatusA captain's subscription plan and validity window. The entitlement check reads this to decide whether the captain can take rides commission-free. The recurring charge here is the revenue event under the subscription model.
ride_eventsevent_id (PK)ride_id (FK)typetsThe event stream for a ride: requested, assigned, arrived, started, completed. Flows through Kafka into the data platform for analytics and for training the driving-time models.
paymentspayment_id (PK)ride_id (FK)amount_paisemethod (upi|wallet|cash)statusidempotency_keyFare settlement per ride, idempotent so a retry never double-charges. Cash is common in this market and reconciles at trip end.
Deep dives
These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.
Two-wheeler dispatch: from radius to hex grid and learned driving time
Rapido has published its dispatch evolution, which makes for a concrete matching discussion. The first version was a simple radial system: on a request, draw a circle of a couple of kilometers, find the captains inside, order them by straight-line distance, and ping them in that order. The flaw it identified is that straight-line distance is misleading. A captain who looks close as the crow flies may be far by road, because a ring road, a railway crossing, or a divider forces a long detour, so the nearest-by-line captain is not the fastest to arrive. Rapido moved to two changes. It partitioned geography with the H3 hexagonal grid at a fixed resolution, which gives clean, uniform cells to index captains and reason about neighborhoods. And it learned driving times from historical rides instead of trusting distance, predicting both the time for a captain to reach the rider and the time for the trip, bucketed by time of day and day of week so the estimates reflect real traffic patterns. A two-week A/B test in one city showed a meaningful reduction in estimated arrival time. The interview lesson is that in real cities, travel time, learned from data and indexed on a good spatial grid, beats straight-line distance for matching.
The subscription model and how it changes settlement
This is Rapido's flagship difference. Ola and Uber take a commission on each ride, so their money system computes a fare split per trip and keeps a commission ledger. Rapido shifted much of its supply to a subscription model: a captain pays a flat subscription, and in return keeps the full fare on rides, with no per-ride commission. Rapido has publicly stated this model and reported about 275 crore rupees of subscription income in a year, growing many times over, so it is real and material. The exact entitlement system is not published, so reasoning from the stated model, the settlement path changes shape. Instead of calculating a commission and recording it on every trip, the system performs an entitlement check, does this captain hold an active subscription, and if so settles the entire fare to them, so the per-ride money movement becomes simpler while the recurring subscription billing becomes the revenue event. Designing that means a subscription service with plans and validity windows, an entitlement check on the ride path, and recurring billing, which is a different money design from a commission engine. It is worth being explicit in an interview that the model is well established publicly while its internals are inferred.
Why straight-line distance is the wrong metric
It is worth dwelling on the specific insight Rapido called out, because it generalizes. Straight-line, or euclidean, distance is cheap to compute and is what a naive nearby-captain query returns, but it ignores the road network. Two captains equally far by line from a rider can be very different by road: one on the same street arrives in two minutes, the other across a highway with the nearest crossing a kilometer away arrives in fifteen. Matching on straight-line distance therefore picks the wrong captain and gives the rider a wrong arrival estimate. The fix is to estimate actual travel time, which depends on the road layout and the current traffic, and the practical way to do that at scale is to learn it from the huge history of past rides between areas, bucketed by when they happened, rather than to compute a live route for every candidate. This is a common and important pattern in mobility systems: replace a geometric proxy with a learned, data-driven estimate of the thing you actually care about, which here is time, not distance.
One platform, several vehicle types and money models
Rapido runs bikes, autos, and cabs, and has grown a large delivery business on top, so a single platform serves several vehicle types, and different types can carry different money models. That matters for the design, because the matching core, find a nearby suitable captain and assign the trip, is shared, but the settlement can differ: some supply runs on the subscription model with full-fare settlement, and some can run on commission, so the money layer has to support both behind the same ride flow. Keeping the matching engine agnostic to vehicle type and money model, and pushing those differences into configuration and the settlement service, is what lets one platform span two-wheelers, three-wheelers, cars, and parcels without forking the core. The interview point is separation of concerns: a shared real-time matching core, with vehicle-type and money-model differences isolated at the edges.
The data platform behind matching and pricing
The learned driving times, pricing, and analytics all depend on a large data pipeline that Rapido has published. Kafka ingests millions of events a day, ride requests, assignments, locations, completions, with at-least-once delivery, into a layered lake: a raw immutable layer, then a cleaned and enriched canonical layer, then curated datasets for specific uses. Queries run through Trino, split across several clusters for different workloads behind a gateway, with Spark for heavier processing, all on Google Cloud using low-cost spot machines with automatic switching to cut cost. Rapido reported this processing more than 15 petabytes a month across about a thousand pipelines. The reason this belongs in a system-design answer is that the matching quality, the driving-time models that make dispatch good, is only as good as the data platform that trains them, so a mobility system is as much a data-engineering problem as a real-time serving one, and the two are tightly linked.
Trade-offs to discuss
Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.
Subscription and full-fare settlement versus per-ride commission
A per-ride commission scales revenue directly with rides and needs no upfront commitment from captains, but it means captains give up a cut of every fare, which hurts supply loyalty, and it requires a commission calculation and ledger per trip. A flat subscription lets captains keep the whole fare, which is attractive to supply, and turns revenue into predictable recurring billing, at the cost of decoupling revenue from ride volume and needing an entitlement system. Rapido's large subscription income shows the trade paying off for supply growth, and the settlement path becomes an entitlement check rather than a commission computation.
H3 hex grid and learned driving time versus radius and straight-line distance
A radius-and-straight-line approach is trivial to build and fast, but it picks captains by a metric that ignores the road network, giving wrong matches and wrong arrival estimates across barriers. Indexing on a hex grid and learning actual driving times from history gives matches and estimates that reflect how the city moves, which Rapido showed reduced estimated arrival time. The cost is building and training the driving-time models and the data platform behind them, accepted because match quality and accurate estimates directly affect the experience.
Learned driving times versus live routing for every candidate
Computing a live route for every candidate captain on every request would be the most accurate but is far too expensive at the request rate. Learning travel times between areas from the large history of past rides, bucketed by time, gives a good estimate cheaply and instantly at match time. The cost is that learned estimates can miss a sudden, unusual condition, mitigated by the time bucketing and by refreshing the models, accepted because it is the only way to estimate travel time fast enough for real-time matching.
One multi-modal platform versus separate systems per vehicle type
Separate systems per vehicle type would each be simpler, but they would duplicate the matching, tracking, and settlement logic and make it hard to share improvements. One platform with a shared matching core and vehicle-type and money-model differences isolated at the edges avoids that duplication and lets bikes, autos, cabs, and delivery reuse the same engine. The cost is a more general, carefully abstracted core, accepted because Rapido spans several modes and money models.
Spot machines and a tiered data lake versus on-demand and a single store
Running the data platform on on-demand machines with one big store is simpler and more predictable, but expensive at 15 petabytes a month for a low-fare business. Using low-cost spot machines with automatic failover and a layered lake with query engines suited to each workload cuts cost sharply. The cost is tolerating spot reclamation and operating a more complex platform, accepted because cost efficiency matters for a mass-market model and the tiered lake serves both cheap raw storage and fast curated queries.
How Rapido actually does it
Rapido's dispatch evolution and data platform are documented on its own engineering blog, its business and money model are well covered in press and stated by its founders, and the internal subscription-entitlement design is reasoned from the stated model rather than published. On dispatch, Rapido described moving from a radial system that ordered captains by straight-line distance to partitioning geography with the H3 hexagonal grid and learning driving times from historical rides, predicting both captain-to-rider time and trip time bucketed by time of day and day of week, and reported a meaningful reduction in estimated arrival time from a two-week A/B test in one city. On its data platform, Rapido reported Kafka ingesting millions of events a day into a layered lake, queried with Trino across several clusters and processed with Spark on Google Cloud with low-cost spot machines, processing more than 15 petabytes a month across about a thousand pipelines and three thousand datasets, with about 1.5 million rides a day across 100-plus cities as of late 2022. On the business, Rapido reported about 50 million monthly active users and 2 million active captains in 2025, roughly half of India's ride-hailing overall with leadership in bike taxis, revenue of about 934 crore rupees in FY25, and, distinctively, about 275 crore rupees of subscription income growing many times year on year, with delivery now its largest single revenue stream. Three accuracy notes for the interview. First, the strongest Rapido-published ride figure is about 1.5 million rides a day from 2022; larger recent daily-ride numbers come from secondary blogs and should be treated as unverified. Second, the exact subscription entitlement and billing internals are not published, and which vehicle types use subscription versus commission is reported inconsistently, so describe the model at the concept level, a flat pass and full-fare settlement, which is well verified. Third, live-location streaming and surge internals are not published for Rapido and are the general ride-hailing pattern.
Sources
- Rapido Labs, Improving Dispatch with Data: the move from radial straight-line matching to H3 hex grids and learned driving times, with a Hyderabad A/B test
- Rapido Labs, Data Platform at Rapido (Part I): Kafka, a layered lake, Trino clusters, Spark, Google Cloud spot machines, 15-plus petabytes a month, about 1.5 million rides a day
- Entrackr, Rapido FY25 financials: about 934 crore rupees revenue and 275 crore rupees of subscription income growing many times year on year, with delivery the largest stream
- Forbes India, How Rapido is breaking the Uber-Ola duopoly: about 50 percent share, 50 million monthly active users, 2 million earners, and the founder on the subscription model
- Business Standard, Rapido extends its zero-commission subscription model from cabs to auto drivers
- Wikipedia, Rapido: founding, funding rounds, unicorn valuation, and service timeline
Lessons to study before this interview
If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.
Geospatial Indexing
intermediate / database types storage
WebSockets
intermediate / messaging event systems
Message Queues
intermediate / messaging event systems
Cache-Aside Pattern
foundation / caching strategies
Load Balancing
foundation / core fundamentals
High Availability
advanced / reliability resilience
Rate Limiting for Resilience
advanced / reliability resilience