Design Stripe: System Design Interview Guide
Stripe processes 1 trillion dollars in payment volume per year across 200+ countries, with 99.999% API availability and idempotent guarantees on every charge.
Designing Stripe means designing a payment system from the API down to a double-entry ledger. The defining concerns are correctness (charge exactly once), durability (no charge is ever lost), regulatory compliance (PCI DSS scope, KYC), and global reach (multiple currencies and payment methods). It is the canonical system design problem where consistency dominates speed.
Asked at: Commonly asked at Stripe, PayPal, Adyen, Square, and any fintech. Also asked at FAANG, especially Amazon (for the order pipeline) and Google (for Google Pay).
Why this question is asked
Design Stripe forces you to talk about idempotency seriously, the double-entry ledger, webhook delivery guarantees, PCI scope reduction, fraud signals, and global card network interaction. It is the cleanest test of whether a candidate can design a system where correctness matters more than throughput.
Requirements
Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.
Functional requirements
- Merchants create charges via REST API or hosted checkout
- Charges are routed to a card network (Visa, Mastercard) or alternative method
- Idempotent retries: the same idempotency key returns the same result, no double charge
- Webhooks fire on every state change (succeeded, failed, refunded)
- Refunds (full and partial) are first-class entities
- Subscriptions: recurring charges with retry and dunning logic
- Multi-currency support and FX conversion
Non-functional requirements
- 99.999% API availability (the famous five nines)
- Exactly-once charge semantics
- PCI DSS Level 1 compliance with minimum scope on merchant side
- Webhook delivery with at-least-once and ordered guarantees per event type
- Fraud and chargeback detection
- Audit trail for every state transition
Back-of-envelope scale estimates
Show your math. Pulling numbers from thin air signals you have not thought about the load.
Annual payment volume
$1T
Public Stripe reporting for 2023. Used to size ledger storage and reconciliation jobs.
Charges per second (peak)
13K
Black Friday and other peaks. Average is ~1K per second; peak factor ~13x.
API requests per second
50K
Includes reads (charge lookups, customer fetches) and writes. Reads outnumber writes 5:1.
Webhooks per day
100M
Multiple events per charge (succeeded, captured, dispute opened). At-least-once delivery with retries amplifies this further.
Ledger storage per year
5 to 20 TB
Double-entry rows per charge plus partial refunds and disputes. Append-only and partitioned by time.
High-level architecture
Client (merchant or hosted checkout) calls the Public API. The API Gateway terminates TLS, authenticates the API key, and routes to the appropriate service. The Charge Service is the heart of the system: it validates the request, looks up or creates the customer, dedupes by idempotency key (Redis with strong consistency, falls back to a sharded SQL idempotency_keys table), writes a pending charge to the canonical Postgres store, and submits to the appropriate card network gateway. On network response, the Charge Service updates the charge state, writes corresponding rows to the double-entry Ledger, and emits a state-change event. The Webhook Service consumes events and delivers them to merchant endpoints with at-least-once retry. A Fraud Service runs synchronously on the request path to score and optionally block. A Reconciliation Service runs nightly to compare the ledger against bank statements.
In a real interview, sketch this on the whiteboard before diving into any single box.
Core components
Walk through each service. The interviewer wants to hear what each one owns, not just the names.
Public API Gateway
Authenticates API keys, enforces rate limits, validates request schemas, and routes to internal services. Logs every request for audit. Exposes /v1/charges, /v1/customers, /v1/refunds, and so on.
Charge Service
Owns the charge lifecycle. Dedupes by idempotency key. Writes the canonical charge record. Calls the card network gateway. Updates state on response. Emits events.
Idempotency Store
Redis with strong consistency for fast dedup, fallback to a sharded SQL table for durability. Key is the idempotency_key plus API key. Value is the response body of the first successful call.
Card Network Gateway
Talks to Visa, Mastercard, and other card networks over ISO 8583 or modern APIs. Handles 3D Secure flows, tokenization, and authorization vs capture distinction.
Ledger Service
Double-entry bookkeeping. Every charge writes two rows: a credit to merchant balance and a debit to customer's card account (in Stripe's books). All rows are append-only.
Webhook Service
Consumes state-change events from a durable queue. Delivers to merchant URLs with HMAC signing. Retries with exponential backoff for up to 3 days. Ordered per event type via a hash of the customer_id.
Fraud Service
Synchronous risk scoring on the charge path. Inputs include card history, customer history, IP geolocation, and device fingerprint. Returns a block, review, or allow decision.
Reconciliation Service
Nightly batch job comparing the internal ledger to settlement reports from card networks and partner banks. Flags discrepancies for ops investigation.
Data model
Pick the right store per table. Justify each choice with the access pattern, not by reflex.
chargescharge_id (PK)amount_centscurrencycustomer_idmerchant_idstatuscreated_atidempotency_keySharded by merchant_id. Idempotency key is indexed and unique per merchant. State transitions are immutable; new states are appended to a charge_events table.
ledger_entriesentry_id (PK)charge_idaccount_idamount_centsdirection (debit, credit)currencyposted_atAppend-only double-entry. Every charge produces a balanced pair of rows. Partitioned by month for archival.
customerscustomer_id (PK)merchant_idemailstripe_user_id_hashcreated_atSharded by merchant_id. Card details never live here; they live in a separate PCI-scoped vault.
payment_methods (PCI vault)method_id (PK)customer_idtype (card, ach)card_last_4 (token)card_tokenexpires_atLives in a PCI-Level-1 enclave. Token is what is exposed to non-PCI services. Raw PAN is never persisted outside the vault.
webhook_deliveriesdelivery_id (PK)event_idendpoint_urlstatusattemptsnext_retry_atTracks each attempt. Retries follow exponential backoff. After 3 days of failures, the delivery is marked permanently failed and surfaces in the merchant dashboard.
Deep dives
These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.
Idempotency: exactly-once on a retry-prone network
The internet is unreliable. A merchant's POST to /v1/charges might time out after the server has already created the charge. If the merchant retries, you cannot create a second charge. Stripe's solution: every write request takes an Idempotency-Key header (UUID generated by the client). On first request, the server processes normally, then stores the response body keyed by (api_key, idempotency_key). On retry with the same key, the server returns the cached response without re-processing. The idempotency store has to be strongly consistent (Redis with replication and quorum, or a sharded SQL table) because a stale read causes a double charge. Keys expire after 24 hours. This is the single most important pattern in payments.
Double-entry ledger and balance correctness
A payment is not a single number changing. It is a pair: money flows out of one account and into another. Stripe writes every charge as two ledger entries: a credit to the merchant's pending balance and a corresponding debit to Stripe's clearing account. When the bank settles, two more entries: debit pending balance, credit settled balance. Refunds reverse the pattern. The ledger is append-only; you never update a row, you write a new one. This gives you a perfect audit trail and lets you reconstruct balances at any point in time. The balance of any account at time T is the sum of all entries up to T. To make this query fast, periodic snapshots are taken and the balance is computed as snapshot + delta.
Webhook delivery with at-least-once and ordering
When a charge succeeds, Stripe must notify the merchant via webhook. The merchant's endpoint might be down, slow, or returning 500s. The Webhook Service consumes events from a durable Kafka topic. For each event, it POSTs to the merchant URL with an HMAC signature for tamper-detection. On non-2xx response, it retries with exponential backoff (a few seconds, then minutes, then hours) for up to 3 days. After max retries, the delivery is marked permanently failed and exposed in the dashboard for manual replay. To keep order within a customer's events (charge.succeeded must arrive before charge.refunded), partitioning by customer_id ensures events for the same customer go through the same consumer instance.
PCI scope reduction and tokenization
Anyone storing or transmitting raw card numbers (PAN) is in PCI DSS Level 1 scope, which is expensive (audits, isolated networks, controlled access). Stripe's tokenization model: the merchant integrates Stripe.js or Elements on their front-end. The card number is captured directly by Stripe's iframe, posted to Stripe's PCI-scoped vault, and exchanged for a token. The merchant's server only ever sees the token. This keeps 99% of merchants out of PCI scope. Inside Stripe, the vault is a hardened service with HSM-backed encryption keys, network isolation, and audit logging. Tokens are mapped to card data only inside the vault; the rest of the system uses tokens.
Trade-offs to discuss
Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.
Strong consistency on idempotency vs eventual
Eventual consistency causes double charges. Strong consistency is non-negotiable here. Redis with quorum writes, or a SQL primary, gets you this without giving up speed.
Synchronous fraud check vs async
Async would let high-fraud charges complete and then refund, which is bad for merchants. Synchronous adds 50-100ms to the charge path but blocks bad charges. Synchronous wins.
Single global ledger vs per-currency
Single ledger requires FX at every cross-currency entry, which adds complexity. Per-currency ledgers are cleaner but require explicit reconciliation rows for cross-currency transfers. Stripe uses per-currency ledgers with explicit FX entries.
Append-only ledger vs mutable balances
Mutable balances are faster to query but lose audit history. Append-only with periodic snapshots gives you both correctness and reasonable query speed. Append-only wins in any system where audit matters.
Self-built card network gateway vs Adyen-style aggregator
Self-built is months of work and ongoing certification, but cuts the per-transaction fee. Aggregator is faster to ship but expensive. Stripe is past the threshold where self-built makes sense. A startup is not.
How Stripe actually does it
Stripe's API famously prioritizes correctness over speed. The idempotency key pattern is now an industry standard largely because of Stripe. The double-entry ledger is implemented in Postgres with sharded primaries and read replicas. The PCI vault is a custom service called Tokenization (or its successor) backed by HSM-backed encryption. Webhooks run on a separate Kafka cluster with delivery tracking in a dedicated SQL store. The fraud system (Radar) uses a custom ML pipeline trained on transaction history across the network. Stripe's infrastructure runs primarily on AWS with extensive use of Postgres, Kafka, and Redis.
Lessons to study before this interview
If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.