Design Paytm: System Design Interview Guide
UPI processed 21.6 billion transactions in December 2025 alone, peaking near 7,500 transactions per second, and a single rupee can never be created or lost in any of them.
Designing Paytm is the canonical India payments problem. You have to move real money between two banks over the NPCI UPI rails, keep a double-entry ledger that always balances, make every operation idempotent so a retry never double-charges, reconcile against the bank end-of-day, and survive a Diwali sale spike that is 5 to 10x a normal Tuesday. It is less about clever algorithms and more about correctness under failure.
Asked at: Commonly asked at Paytm, PhonePe, Razorpay, Cred, Google Pay, Amazon Pay, Flipkart, Swiggy, Zerodha, and most India fintech and FAANG-India teams. It is the standard payments and ledger interview for SDE2 and above.
Why this question is asked
Payments is the one domain where eventual consistency is not an acceptable shrug. The interviewer is checking whether you understand that money movement is a distributed transaction across systems you do not control (the user's bank, NPCI, the merchant's bank), that the network will fail mid-flight, and that the only acceptable outcomes are "fully done" or "fully reversed" with a customer who can see exactly what happened. You earn the offer by talking about idempotency keys, a double-entry ledger, the DEEMED state, reconciliation, and what happens when the debit succeeds but the credit times out. You lose it by drawing a "Payment Service to Stripe" box and moving on.
Requirements
Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.
Functional requirements
- User links a bank account and creates a UPI ID (VPA) like name@paytm
- User pays a person or merchant by VPA, QR code, or phone number (push / pay)
- Merchant or user raises a collect request that the payer approves with their UPI PIN (pull / collect)
- User can load money into a Paytm wallet (PPI) and pay from wallet balance
- Every transaction has a status the user can poll: pending, success, failed, refunded
- Failed or timed-out debits are auto-reversed and money returns to the source
- Refunds can be issued by a merchant back to the original payer
- User sees a passbook / statement with a running balance that always reconciles
Non-functional requirements
- Exactly-once money movement: a retry must never debit twice
- The ledger must always balance (sum of debits equals sum of credits) at all times
- Strong consistency and durability on the money path; no lost writes
- P99 end-to-end UPI latency under a few seconds (NPCI round trip dominates)
- Survive 5 to 10x spikes during festivals, salary day, and flash sales
- Idempotent against client retries, network timeouts, and duplicate webhooks
- Auditable: every paisa movement is traceable to an immutable ledger entry
- Daily reconciliation with each bank and with NPCI settlement files
Back-of-envelope scale estimates
Show your math. Pulling numbers from thin air signals you have not thought about the load.
Platform transactions/day
~250M
Paytm-class share of UPI plus wallet and gateway traffic. UPI as a whole crossed 640M+ transactions/day in 2025; a large PSP handles a sizeable slice. Round to a few hundred million for sizing.
Average TPS
~3,000
250M / 86,400 seconds is roughly 2,900. Paytm publicly cites a payment gateway capable of ~3,000 transactions per second, which lines up.
Peak TPS (festival / sale)
15K-30K
Apply a 5 to 10x peak factor for Diwali, salary day, and flash sales. UPI system-wide already peaks near 7,500 TPS; a single big PSP must provision well above its daily average.
Ledger writes per transaction
2 to 4
Double-entry means at least a debit and a credit row per money movement. Wallet load, P2M with fee, and refunds add more legs. Budget 2 to 4 immutable ledger rows per logical transaction.
Storage per transaction
~2 KB
Payment row plus ledger entries plus a state-machine event log. At 250M/day that is ~500 GB/day raw, roughly 180 TB/year before compression, partitioned by date and archived to cold storage after a retention window.
Read:write ratio
~5:1
Status polling, passbook loads, balance checks, and reconciliation reads dominate. The write path is the money path; reads are everything else and are served from read replicas and caches.
High-level architecture
A client (the Paytm app or a merchant SDK) hits an API Gateway that does auth, rate limiting, and device-fingerprint checks. Behind it, an Orchestrator (the Payment Service) owns the lifecycle of one transaction: it generates or accepts an idempotency key, writes a payment row in a PENDING state, runs synchronous risk and limit checks, and then drives the money movement. For a UPI payment, the orchestrator calls Paytm's UPI PSP layer, which speaks the NPCI API. NPCI is the central switch: it routes a debit-leg request (ReqPay) to the payer's bank, gets the debit result (RespPay), then routes a credit-leg request to the beneficiary's bank, and finally returns the combined result. For a wallet payment, no bank is involved; the orchestrator just moves balance inside Paytm's own PPI ledger, backed by an escrow account at a scheduled commercial bank. Every state change is appended to a double-entry Ledger Service: each money movement produces balanced debit and credit rows in a durable, append-only store. State transitions are also emitted as events to Kafka, which feeds asynchronous consumers: notifications, fraud scoring, analytics, settlement, and a Reconciliation Service that compares Paytm's view of every transaction against the bank and NPCI end-of-day files and flags any mismatch. A scheduled reversal worker chases DEEMED and timed-out transactions to a terminal state. The money path uses strong-consistency relational stores (sharded MySQL/InnoDB clusters); the read-heavy and analytics paths use replicas, caches, and a columnar warehouse.
In a real interview, sketch this on the whiteboard before diving into any single box.
Core components
Walk through each service. The interviewer wants to hear what each one owns, not just the names.
API Gateway
Terminates TLS, authenticates the user or merchant, enforces per-user and per-merchant rate limits, and attaches an idempotency key (from the client, or generated server-side and returned). It is the first place a duplicate retry can be cheaply caught.
Payment Orchestrator
The brain of one transaction. Owns the state machine (INITIATED, RISK_CHECKED, DEBIT_PENDING, CREDIT_PENDING, SUCCESS, FAILED, REVERSED). It is deliberately stateless per request; all durable state lives in the payment row and the ledger. It calls risk, the UPI PSP, and the ledger in a strict order so a crash at any point is recoverable.
UPI PSP / NPCI Connector
Speaks the NPCI UPI API. Resolves a VPA to a bank account, fires the debit-leg ReqPay to the payer bank, then the credit-leg to the beneficiary bank, and interprets RespPay (SUCCESS, FAILURE, or DEEMED). Handles signing, mandates, collect requests, and the strict timeout windows NPCI imposes.
Ledger Service
An append-only, double-entry book. Every money movement writes balanced debit and credit rows in a single atomic transaction. Account balances are derived from the ledger, never edited in place. This is the source of truth for how much money exists and where it sits.
Wallet / PPI Service
Manages Paytm wallet balances, which are a prepaid payment instrument backed by a pooled escrow account at a bank. Loading money debits the user's bank (via UPI/cards) and credits their wallet ledger; the real cash sits in escrow. Wallet-to-wallet payments are pure ledger moves with no bank round trip, which is why they feel instant.
Risk / Fraud Engine
Runs synchronously in the hot path for hard rules (velocity limits, blocklists, NPCI per-transaction caps) and asynchronously for ML scoring (device, behavior, graph signals). A hard rule can block a payment before debit; soft signals raise a transaction for review and can trigger a hold.
Reconciliation Service
Ingests end-of-day settlement and switch files from each bank and NPCI, joins them against Paytm's internal ledger, and emits a break report for any transaction where the two disagree. It is the safety net that catches a debit the bank recorded but Paytm marked failed, and vice versa.
Reversal / Sweeper Worker
A scheduled job that finds transactions stuck in DEEMED or DEBIT_PENDING past their timeout, queries the bank/NPCI for the true outcome, and drives them to SUCCESS or REVERSED. NPCI now targets reversal of timed-out debits within about 30 seconds, so this worker runs aggressively.
Notification Service
Consumes ledger and state events from Kafka to push SMS, in-app, and webhook notifications to users and merchants. Webhooks to merchants are themselves idempotent and retried with backoff until acknowledged.
Data model
Pick the right store per table. Justify each choice with the access pattern, not by reflex.
paymentspayment_id (PK)idempotency_key (UNIQUE)payer_idpayee_vpaamount_paisecurrencyinstrument (upi, wallet, card)stateexternal_ref (NPCI RRN)created_atupdated_atOne row per logical transaction. amount stored in paise (integer) never floats. idempotency_key has a UNIQUE constraint so a duplicate request collides at the database, not in application logic. Sharded by payer_id. Strong consistency; this is on the money path.
ledger_entriesentry_id (PK)transaction_idaccount_iddirection (debit, credit)amount_paisebalance_aftercreated_atAppend-only, immutable. Every money movement inserts balanced debit and credit rows in one DB transaction; their amounts must sum to zero per transaction_id. You never UPDATE a balance; you derive it by summing entries (with periodic snapshot rows to bound the scan). This is the system's source of truth.
accountsaccount_id (PK)owner_idaccount_type (user_wallet, escrow, merchant, fee, suspense)cached_balance_paiseversionLogical accounts in the ledger, including internal ones: an escrow account for pooled wallet cash, a fee account, and a suspense account for in-flight or unreconciled money. cached_balance is a denormalized read optimization protected by an optimistic version; the ledger remains authoritative.
transaction_eventsevent_id (PK)payment_id (FK)from_stateto_statereason_codeactor (orchestrator, npci, reversal_worker)created_atAppend-only audit log of every state transition. Lets you replay exactly what happened to a stuck payment and is the first thing support and reconciliation read during a dispute.
recon_breaksbreak_id (PK)payment_idsource (bank, npci)internal_stateexternal_stateamount_paisestatus (open, resolved)detected_atOne row per mismatch found during reconciliation: e.g. NPCI says SUCCESS but Paytm says FAILED. Drives manual and automated resolution. An empty break table at day end is the goal.
Deep dives
These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.
Idempotency: retrying without double-charging
The hardest failure in payments is ambiguity. A client fires a pay request, the network drops the response, and the client retries. Did the first one go through? The fix is an idempotency key: a unique token the client generates once per logical payment and reuses on every retry of that same payment. The server stores it with a UNIQUE constraint on the payments table. The first request inserts the row and starts the money movement; any retry with the same key collides on the unique index and the server returns the existing payment's current status instead of starting a second debit. The key must be scoped (per user, per amount, with a TTL) so a genuine second payment of the same amount is not wrongly deduplicated. Crucially, idempotency extends downstream: the call to the bank/NPCI carries a unique transaction reference, and the ledger write is keyed by transaction_id, so even the orchestrator crashing and replaying cannot produce two debits.
Double-entry ledger that always balances
Never store a single mutable 'balance' column and add or subtract from it; a lost update or a partial failure silently destroys money. Instead, model money as movement between accounts. Every transaction writes at least two ledger rows in one atomic DB transaction: a debit on one account and an equal credit on another, summing to zero. A wallet payment from A to merchant M is: debit A's wallet account, credit M's account. A fee splits into a third leg: debit A, credit M for the net, credit a fee account for the cut, still summing to zero. Balances are derived by summing entries (snapshotted periodically so the scan stays bounded). Because the ledger is append-only and immutable, you can replay it, audit it, and prove at any instant that total money in the system is conserved. This is the single most important thing to say in a Paytm interview.
The UPI flow: debit leg, credit leg, and the DEEMED state
A UPI payment is a distributed transaction across systems Paytm does not own. The PSP sends a ReqPay to NPCI, which routes the debit leg to the payer's bank. The bank debits and returns RespPay. NPCI then routes the credit leg to the beneficiary's bank, which credits and responds. NPCI returns the combined outcome. The dangerous case is the debit succeeding but the credit response timing out: NPCI marks it DEEMED (uncertain), not SUCCESS or FAILURE. You must not show the user 'failed' and let them retry, because the money may already have moved. The orchestrator parks the transaction, and the reversal worker chases the true status with NPCI; if the credit truly failed, the debit is reversed and the payer is refunded (NPCI now targets this within ~30 seconds). Modeling DEEMED explicitly, rather than collapsing it into success or failure, is exactly what separates a senior answer from a junior one.
Reconciliation: trusting the bank, not just yourself
Your internal ledger is your opinion of what happened. The bank and NPCI have their own records, and they will sometimes disagree because of timeouts, partial settlements, and DEEMED transactions. Reconciliation ingests end-of-day files from each bank and the NPCI switch, joins them against the payments and ledger tables by reference number (RRN), and emits a break for every row where state or amount differs. A debit the bank recorded but Paytm marked failed means a customer is out money and must be refunded; the reverse means Paytm credited a merchant for money it never received. Recon is run daily (often hourly for large PSPs), and unresolved breaks are a regulatory and trust problem. The suspense account holds in-flight and unreconciled money so the ledger still balances while a break is open.
Surviving festival and salary-day spikes
Traffic is brutally spiky: Diwali, the 1st of the month, and flash sales push load to 5 to 10x a normal Tuesday in minutes. The money path cannot be dropped under load the way a 'like' can, so you protect it with layered defenses. Rate limit and queue at the gateway so the bank/NPCI connectors are never overwhelmed (NPCI itself enforces TPS caps per PSP). Make the write path as short as possible: a fast PENDING insert plus async fan-out, rather than synchronous calls to five services. Shard the payments and ledger stores by user/merchant so no single node is the bottleneck, and pre-scale connection pools before the known spike. Use backpressure: if the bank is slow, shed or delay new requests rather than pile up in-flight debits you cannot resolve. And degrade gracefully: wallet payments (pure ledger moves) keep working even if the UPI rails to a specific bank are congested.
Fraud and risk in the hot path
Fraud checks split into synchronous hard rules and asynchronous scoring. Hard rules run before the debit and must be fast: per-transaction and per-day amount caps (NPCI caps P2P at ₹5 lakh), velocity limits (too many payments too fast), VPA and device blocklists, and new-payee cool-off windows. These can block a payment outright. Heavier ML scoring (device fingerprint, behavioral and graph signals to catch mule networks) runs off the Kafka event stream; a high score can flag a completed transaction for review, place a hold on a payout, or step up auth on the next attempt. The trade-off is latency versus coverage: you cannot run a deep model inline without blowing the latency budget, so the inline layer is cheap rules and the expensive model is near-real-time.
Trade-offs to discuss
Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.
Double-entry ledger vs a single mutable balance column
A mutable balance is simple and fast but a lost update or partial failure silently creates or destroys money, and you can never prove what happened. Double-entry is more writes and more storage, but it is auditable, self-balancing, and replayable. For money, this is not a real choice; you pick double-entry and accept the cost. Say so explicitly.
Strong consistency (SQL) vs eventual consistency (NoSQL) on the money path
The ledger and payments tables need ACID transactions and a unique constraint on the idempotency key, which pushes you to a relational store like sharded MySQL/InnoDB or Postgres. NoSQL scales writes more easily but its weaker transactional guarantees are wrong for money. Use NoSQL/caches/replicas for the read-heavy and analytics side, never for the debit/credit write.
Synchronous vs asynchronous money movement
UPI's bank round trip is inherently synchronous and slow (seconds), so you front it with a fast PENDING write and poll for status rather than blocking the user on the full bank latency. Wallet-to-wallet is a pure ledger move and can be synchronous and instant. The cost of async is that the client must handle PENDING and poll, and you must own the DEEMED reconciliation. The benefit is you do not hold a request open across an external system you do not control.
Idempotency at the gateway vs at the database
Catching duplicates at the gateway (cache lookup) is cheap and fast but not authoritative under race conditions. A UNIQUE constraint on idempotency_key at the database is the real guarantee: two concurrent retries cannot both insert. Do both: gateway cache for the common case, DB constraint as the backstop. Relying on the cache alone is a classic interview trap.
Storing amounts as integer paise vs floating point
Floating point cannot represent 0.10 exactly and accumulates rounding error, which in a ledger means money leaks. Always store amounts as integer minor units (paise) or a fixed-precision decimal. This is a small detail that a sharp interviewer will probe, and getting it wrong signals you have not built a real payment system.
Sharding by user vs by merchant vs by time
Sharding payments by payer_id spreads write load and keeps a user's history co-located, but a hot merchant on a sale day concentrates credits on a few shards. Time-based partitioning helps reconciliation and archival but creates a hot 'today' partition. The pragmatic answer is shard the payments by user, partition the ledger by date for archival, and treat hot merchants as a separate scaling concern (dedicated capacity, async credit aggregation).
How Paytm actually does it
Paytm's payment gateway is publicly described as capable of around 3,000 transactions per second, built on open-source MySQL with the InnoDB engine across many independent database clusters, one per business line, each tuned for its own read/write mix. The broader system it plugs into is UPI, run by NPCI as a four-party push-pull switch: a remitter-side PSP and bank, a beneficiary-side PSP and bank, with NPCI routing the debit leg and credit leg and returning a SUCCESS, FAILURE, or DEEMED result. UPI crossed 21.6 billion transactions in December 2025 and peaks near 7,500 transactions per second system-wide, with per-transaction P2P limits of ₹5 lakh and a push by NPCI to reverse timed-out debits within about 30 seconds. Paytm's wallet is a prepaid payment instrument (PPI): user balances are an internal ledger, while the actual cash sits pooled in an escrow account at a scheduled commercial bank, which is why RBI's PPI and escrow rules shape the design as much as the engineering does. The hard lessons in this space are universal: money is integer paise, retries carry idempotency keys, the ledger is double-entry and append-only, and reconciliation against the bank is the safety net that catches what the happy path missed.
Sources
- Paytm Engineering: How Paytm Handles Millions of Digital Transactions Safely Everyday
- NPCI: UPI Product Overview and statistics
- Wikipedia: Unified Payments Interface (four-party model, limits, volume)
- The Pragmatic Engineer: Designing a Payment System (idempotency, ledger, reconciliation)
- NPCI UPI API / ReqPay-RespPay description (debit/credit leg, DEEMED)
Lessons to study before this interview
If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.
Idempotency
foundation / core fundamentals
Distributed Transactions
advanced / distributed systems core
Saga Pattern
advanced / distributed systems core
Design a Payment System
capstone / capstone
Database Sharding
foundation / database fundamentals
Retry Patterns
advanced / reliability resilience
Rate Limiting for Resilience
advanced / reliability resilience