Design Instagram: System Design Interview Guide
Instagram serves 2 billion users with 500 million daily active stories, 95 million photos and videos uploaded per day.
Designing Instagram combines photo upload pipelines, feed generation (similar to Twitter), Stories (a separate ephemeral feed), Direct Messages, and a heavy media CDN. The hardest piece is generating a personalized feed that mixes friends, followed accounts, and Reels recommendations in 200 milliseconds.
Asked at: Commonly asked at Meta (Instagram's parent), Google, Amazon, Snap, TikTok, and Pinterest. Often a more visual variant of Design Twitter.
Why this question is asked
Design Instagram tests photo and video upload pipelines, feed generation trade-offs (push vs pull), ephemeral content (Stories), and the recommendation feed for Reels. It is a richer version of Design Twitter with a real media pipeline attached.
Requirements
Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.
Functional requirements
- Users upload photos and videos with captions
- Users follow other users and see a chronological or ranked feed
- Stories: ephemeral 24-hour posts
- Direct Messages between users
- Like, comment, share, save posts
- Reels: a TikTok-style recommendation feed
- Search by user, hashtag, location
Non-functional requirements
- Feed load under 500 ms at the 99th percentile
- Photo upload under 5 seconds at the 95th percentile on 4G
- 99.99% availability
- Eventual consistency on like counts and feed
- Scale to 2B users
- Global media delivery via CDN
Back-of-envelope scale estimates
Show your math. Pulling numbers from thin air signals you have not thought about the load.
Total users
2B
Public Meta reporting. Assume 1.2 average profiles per user (a few have business accounts).
Daily active users
1B
Public reporting: 1B+ DAU on Instagram.
Photos and videos uploaded per day
95M
Public reporting. Average ~95M new posts daily.
Feed reads per second (peak)
500K
1B DAU times 10 feed loads per day, with a 4x peak factor.
Media storage growth
20 PB/year
95M new media items times average 4 MB times 5x for multi-resolution thumbnails and processed copies times 365 days.
High-level architecture
Upload path: client uploads media to an Upload Service that writes to a regional blob store. The Media Processor generates thumbnails, runs ML pipelines (face detection, content classification, NSFW filtering), and stores variants in a media store fronted by a CDN. The post metadata (caption, location, tags) is written to a sharded SQL store. Feed path: similar hybrid push-pull as Twitter. For low-follower users, posts are fanned out to follower timelines (Cassandra). For high-follower users, posts are pulled at feed-read time. The Reels feed is generated by a separate recommendation pipeline (candidate generation plus ranking) trained on watch behavior. Stories are stored separately with a 24-hour TTL. DMs run on a separate persistent-connection gateway, similar to WhatsApp.
In a real interview, sketch this on the whiteboard before diving into any single box.
Core components
Walk through each service. The interviewer wants to hear what each one owns, not just the names.
Upload Service
Resumable upload endpoint for photos and videos. Writes raw media to a regional blob store. Emits an UploadComplete event for the Media Processor.
Media Processor
Consumes UploadComplete events. Generates multiple thumbnail sizes, runs face and object detection, applies any filters or AI transforms, and writes variants to the CDN-fronted media store.
Post Service
Writes post metadata (caption, media URLs, location, tags) to sharded SQL. Emits a PostCreated event for fan-out and search indexing.
Feed Service
Generates the home feed. Reads the user's precomputed timeline (Cassandra), merges in pulled content from followed celebrities, and applies a ranking model.
Fan-Out Service
Consumes PostCreated events. For users below a follower threshold, writes the post ID into each follower's timeline. For high-follower users, skips fan-out.
Stories Service
Stores ephemeral 24-hour posts in a separate Cassandra cluster with TTL. Has its own feed read path: which stories are unviewed for this user, ordered by close-friend priority.
Reels Recommendation Service
Two-stage ranking like YouTube. Candidate generator selects ~hundreds of Reels per user. Ranker scores each based on predicted watch time and engagement. Real-time signals from current session refine the ordering.
DM Service
Persistent-connection gateway for direct messages, similar to WhatsApp's architecture but without E2E encryption by default (it is opt-in).
Data model
Pick the right store per table. Justify each choice with the access pattern, not by reflex.
usersuser_id (PK)username (UNIQUE)profile_pic_urlfollower_count_cachedfollowing_count_cachedSharded by user_id hash. Username has a unique constraint for handle reservation.
postspost_id (PK, snowflake)author_idmedia_urls[]captionlocation_idcreated_atSharded by author_id. Snowflake IDs encode timestamp.
followsfollower_id (PK partition)followee_id (PK sort)created_atTwo denormalized tables for follower and following lookups, both sharded.
storiesstory_id (PK)author_idmedia_urlcreated_atexpires_atCassandra with TTL. The TTL is 24 hours after created_at. After expiry, rows are auto-evicted.
user_feeduser_id (PK partition)post_id (clustering by timestamp)Cassandra. Bounded to ~1000 most recent posts. Populated by fan-out for low-follower authors.
Deep dives
These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.
Photo and video upload pipeline
The Upload Service writes raw media to a regional blob store and emits an event. The Media Processor consumes the event and generates several variants: a small thumbnail for the feed, a medium for the profile grid, a full-size for the detail view, and (for videos) multiple bitrate variants for adaptive playback. Each variant goes to the media store fronted by a CDN. During processing, ML jobs run: face detection (for tagging suggestions), content classification (for ads and discovery), and NSFW detection. The post is marked viewable only after processing completes, which usually takes a few seconds for photos and 30+ seconds for videos.
Feed generation: push, pull, and ranking
Same hybrid push-pull as Twitter. Low-follower authors fan out to follower timelines on write. Celebrity authors skip fan-out and are pulled at read time. The novelty in Instagram is the ranking step: even after merging push and pull content, a learned ranking model reorders posts based on predicted engagement, relationship strength (close friends rank higher), and freshness. The model runs as an online service called at feed-read time. Pure chronological feed is still offered as a setting.
Stories: ephemeral feed with TTL
Stories are a separate feed entirely. Each story has a 24-hour TTL written into a Cassandra cluster. The read path is: list stories from accounts I follow that are within the last 24 hours and that I have not yet viewed, ordered by close-friend priority and recency. View state (which stories I have already seen) is stored in a per-user table. The TTL on Cassandra rows handles cleanup automatically. Stories never go into the main feed.
Reels recommendation feed
Reels is a TikTok-style recommendation feed: short videos pulled from across the entire catalog, not just accounts I follow. It uses the YouTube-style two-stage pipeline. A candidate generator (two-tower neural network) produces a few hundred candidate Reels per user from billions in the catalog. A ranker scores each candidate using predicted watch time, completion rate, share rate, and engagement. The ranker also enforces diversity (do not show two Reels from the same author back-to-back) and freshness (mix recent posts with high-engagement classics). The real-time layer adapts to the current session: if you watched a cooking Reel, it boosts cooking-adjacent content in the next pull.
Trade-offs to discuss
Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.
Cassandra vs SQL for the user feed
Cassandra scales writes horizontally and handles the fan-out write storm cleanly. SQL would need careful sharding and would not scale as easily. The cost is no joins (each follower's feed is independent). Cassandra wins for the feed workload.
Ranked vs chronological feed
Ranked lifts engagement but is opaque to users. Chronological is predictable but mixes important and trivial. Instagram offers both: Following is chronological, Home is ranked. Most users default to ranked.
TTL-based cleanup vs explicit deletion for Stories
TTL is cheap (Cassandra evicts rows automatically) and self-healing. Explicit deletion requires a scheduled job. TTL wins for a 24-hour ephemeral workload.
Reels candidate generator vs simple following-based feed
A candidate generator drives discovery beyond the follow graph, which is why TikTok dominated short-form video. A pure follow-based feed limits exposure to creators users have not seen. Reels would lose to TikTok without the recommendation pipeline.
E2E encryption on DMs
Instagram introduced E2E DMs as opt-in, not default. The cost of default E2E would be losing server-side spam detection and content moderation. Opt-in is the pragmatic middle ground.
How Instagram actually does it
Instagram famously scaled to 14 million users with a 3-engineer infra team on Python/Django and PostgreSQL. Today it runs on Meta's infrastructure: TAO (a graph cache layer over MySQL), Cassandra for feeds, Haystack for photo storage, and a CDN fronted by Meta's edge network. The Reels recommendation system reuses architecture from Facebook's ranking systems with adaptations for short-form video. DMs run on a separate stack that overlaps with WhatsApp infrastructure. The photo and video processing pipeline runs on a queue-driven worker fleet that handles millions of uploads per day; ML inference for face detection and content classification runs as a separate job tier so a slow model does not block media availability. Search uses Unicorn, Meta's graph-indexing system originally built for Facebook search, adapted to handle hashtag, user, and place queries with personalization signals from the social graph. Stories were a late addition to the architecture but kept clean by being a separate Cassandra cluster with its own read and write paths, so the main feed pipeline did not have to absorb the 24-hour TTL semantics.
Lessons to study before this interview
If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.