Chat Email File Storage Metrics & Monitoring News Feed Notification System Peak Season Rate Limiter Typeahead Suggestion Uber URL Shortener Video Platform Web Crawler
Clarifications​
DAU : 50M
Messages : 100M/day
Media : 5% of messages
Functional Requirements​
chat with one person or a group (up to 100 people)
share photos, videos, and other files
See when your messages are sent, delivered, and read
See when other users are online
Get notifications for new messages
Non-functional Requirements​
fast and reliable
handle a lot of users and traffic
Assumptions​
Core Features : Group chat, file sharing
Sharing & Notifications : Share files and get alerts on edits/changes
Fast & Efficient : Quick file sync with minimal bandwidth use
Rock-Solid Reliability : No data loss
Load : HA and scalability
Estimations​
Traffic :
Activity: 50 M ( u s e r s ) ∗ 40  m e s s a g e s / d a y = 2 B  m e s s a g e s / d a y 50M (users) * 40\ messages/day = 2B\ messages/day 50 M ( u sers ) ∗ 40  m ess a g es / d a y = 2 B  m ess a g es / d a y
Media: 2 B / d a y  ( a c t i v i t y ) ∗ 5 % ( c o n t e n t  w i t h  m e d i a ) = 100 M  m e d i a / d a y 2B/day\ (activity) * 5\% (content\ with\ media) = 100M\ media/day 2 B / d a y  ( a c t i v i t y ) ∗ 5% ( co n t e n t  w i t h  m e d ia ) = 100 M  m e d ia / d a y
RPS: 2 B / d a y  ( a c t i v i t y ) 24 h ∗ 3600 s e c = 24 K  r e q / s e c \frac{2B/day\ (activity)}{24h * 3600sec} = 24K\ req/sec 24 h ∗ 3600 sec 2 B / d a y  ( a c t i v i t y ) ​ = 24 K  re q / sec
Storage :
Messages: 2 B / d a y  ( a c t i v i t y ) ∗ 100 b y t e ( m e s s a g e  s i z e ) = 200 G B / d a y 2B/day\ (activity) * 100byte (message\ size) = 200GB/day 2 B / d a y  ( a c t i v i t y ) ∗ 100 b y t e ( m ess a g e  s i ze ) = 200 GB / d a y
Media:
Media : 100 M / d a y ( m e s s a g e s ) ∗ 100 K B ( m e d i a  s i z e ) = 10 T B / d a y 100M/day (messages) * 100KB (media\ size) = 10TB/day 100 M / d a y ( m ess a g es ) ∗ 100 K B ( m e d ia  s i ze ) = 10 TB / d a y
10 years : ( 10 T B / d a y ( m e d i a ) + 0.2 T B / d a y ( m e s s a g e s ) ) ∗ 10 ( y e a r s ) ∗ 365 ( d a y ) = 38 P B (10TB/day (media) + 0.2TB/day (messages)) * 10 (years) * 365 (day) = 38PB ( 10 TB / d a y ( m e d ia ) + 0.2 TB / d a y ( m ess a g es )) ∗ 10 ( ye a rs ) ∗ 365 ( d a y ) = 38 PB
Bandwidth : 10 T B / d a y ( m e d i a ) + 0.2 T B / d a y ( m e s s a g e s ) 24 h ∗ 3600 s e c = 120 M B / s e c \frac{10TB/day (media) + 0.2TB/day (messages)}{24h * 3600sec} = 120MB/sec 24 h ∗ 3600 sec 10 TB / d a y ( m e d ia ) + 0.2 TB / d a y ( m ess a g es ) ​ = 120 MB / sec
Clarifications​
Features : Upload, download, sync files, get notifications
Platform : Mobile & Web
File Types : All supported
Encryption : Encrypted storage
Size Limit : Up to 10GB per file
Users : 10M DAU
Assumptions​
Core Features : Drag-and-Drop upload, download, multi-device sync, revision history
Sharing & Notifications : Share files and get alerts on edits/changes
Fast & Efficient : Quick file sync with minimal bandwidth use
Rock-Solid Reliability : No data loss
Load : HA and scalability
Estimations​
Storage Allocation : 50M (users) * 10GB (space allocation / user) = 500 PT/month
Upload Load : 10M (users) * 2 (upload/day) 24 (hours) / 3600 (seconds) = 240 upload/sec
and Peak QPS: QPS * 2 (peak ratio) = 480 upload/sec
Considerations​
Collect metrics from various sources (servers, databases)
Store and query time-series data (time-series database + Kafka)
Process and analyze data (Apache Storm, Flink, Spark)
Retrieve, visualize, and get alerted on metrics (query service, visualization system, alerting system)
Assumptions​
Functionality : Post, Get Feed, Update Feed
Focus : Text and not on large media
Ranking Assumed : Existing system already ranks posts
Global Reach : Serves 1B users worldwide
Share : 500 friends per user
Post Speed : Varies by location (1s - 1 minute)
Availability : Ensures posts aren't lost, but not guaranteed uptime
Ads (Bonus) : Optional addition of advertisements
Estimations​
News Feed Size : 1K (posts) * 10Kb (post size) = 10 MB
Clarifications​
Platform : Mobile & Web
Key Feature : Users post and see friends' posts in a news feed
Feed Order : Newest first
Friend Limit : Up to 5k friends per user
Traffic : 10M DAU
Feed Content : Text, images, and videos
Considerations​
Block Bots & DDoS : Use reCaptcha & rate limiting
Fast Loading : Minify & CDN deliver CSS/JS/Images
Scale on Demand : Spin up isolated dedicated instances & use queues
Prevent Overselling : Lock inventory on order
Fast Cache : Use cache & manage inventory
Minimize & Optimize : Reduce RPC, hops, & streamline services
Isolate Content : Separate static/dynamic & isolate uncommon items
Accurate Confirmation : Show true stock availability
Clarifications​
Match Start Only : Autocomplete suggestions only appear at the beginning of queries
Top 5 Picks : Show 5 most popular suggestions based on past searches
No Spell Check : Focuses on displaying popular suggestions, not correcting typos
Language Support : English for now, multi-language can be discussed later
Lowercase Letters : Search queries are assumed to be lowercase with no special characters
Traffic : Used by 10M DAU
Workflow : Each time you enter a character it'll send a request to a backend service to get suggestions: /search?q=f
-> /search?q=fo
-> /search?q=foo
Assumptions​
Performance : Autocomplete suggestions must appear within 100ms
to avoid lag
Relevance : Suggestions should be related to the search terms
Ranked : Show more popular suggestions first
Scalability : HA, scalability, and fault tolerance considerations
Encoding : UTF-8
Estimations​
Query Size : 4 (words) * 5 (characters) * 1 (byte/character) = 20 bytes/query
Load : QPS: 10M (users) * 10 (queries/day) * 20 (characters) / 24 (hours) / 3600 (seconds) = 24K query/sec
and Peak QPS: QPS * 2 (peak ratio) = 48K query/sec
Storage : 10M (users) * 10 (queries/day) * 20 (bytes/query) * 20% (new users) = 0.4 GB/day
Improvements​
Use Bloom filter for faster empty suggestions
Choose the right data structure for storing the data in database
Trie : Search is O(n)
Optimized Trie : All completions stored along with the prefix in the trie node (more space is required). Limit data by prefix size and completion count for better storage efficiency
Prefix Hash Tree : Search is O(1)
but uses more space
Choose proper data structure for storing the data in cache
Redis List : Search is O(n)
Redis Sorted Set : Search/Update O(log n)
If number of completions reach limit, remove lowest-scoring and add a new one
Clarifications​
Shortener : Long URL (e.g. https://www.myweb.com/learn/system-design/
) to short one (https://tinyurl.com/app
). Clicking short link goes to the original long URL
Volume : 100M shortened URLs generated daily
Length : Short as possible
Characters : Numbers (0-9) and letters (a-z, A-Z)
Assumptions​
URL shortening : given a long URL → return a much shorter URL
URL redirecting : given a shorter URL → redirect to the original URL
Characters : [A-Za-z0-9]
HA, scalability, and fault tolerance considerations, with distributed architecture
Estimations​
URL Shortener : 62 characters ([A-Za-z0-9]) ^ 7 (letters for short URL) = 3.5 trillion (URLs)
Write operation : 100M URLs/day → 100M (URLs) / 24 (hours) / 3600 (seconds) = 1160
(URLs/sec)
Read operation : assuming ratio of read operation to write operation is 10:1
→ 1160 (writes/sec) * 10 (reads ratio) = 11,600 (reads/sec)
Storage requirement : average URL length is 100 bytes
with 10 years
of data storage: 100M (URLs) * 365 (days) * 10 (years) * 100 (bytes) = 365TB
Clarifications​
Features : Upload & watch videos
Clients : Mobile, web, smart TV
Users : 5M DAU
Engagement : 30 min/day
Global : Supports international users
Playback : Accepts most resolutions/formats
Security : Encrypted videos
Upload Limit : Max 1GB videos (focuses on smaller sizes)
Cloud Storage : Consider leveraging existing services (AWS, Google Cloud, Azure)
Assumptions​
Fast Uploads : Videos upload quickly
Seamless Streaming : Videos play smoothly without buffering
Quality Control : Lets users adjust video quality
Cost-Effective : Needs low infrastructure costs
Scalability : HA, scalable, and reliable
Multi-Platform : Works on mobile apps, web browsers, and smart TVs
Estimations​
Storage : 5M (users) * 10% (users upload 1 video/day) * 300MB (video) = 150 TB/day
CDN Costs : 5M (users) * 5 (videos) * 300MB (video) * 0.02 (CDN cost $/GB) = 750k $/day
Clarifications​
Crawling Goal : Search Engine Indexing
Monthly Intake : 1B pages
Content Type : HTML only
Freshness : Include new & edited pages
Storage : Keep crawled HTML for 5 years
Duplicates : Ignore duplicate content
Assumptions​
Scale Up : Handle billions of web pages with parallelization (running tasks simultaneously)
Handle the Weird : Expect broken code, slow servers, crashes, and bad actors. The crawler must adapt
Be Nice : Don't overload websites with too many requests at once based on the website
Future-Proof : Easily add new content types (like images) without a major overhaul
Crawl : Use BFS algorithm
Estimations​
Load : QPS: 1M (pages) / 30 (days) / 24 (hours) / 3600 (seconds) = 400 pages/sec
and Peak QPS: QPS * 2 (peak ratio) = 800 pages/sec
Storage : 1B (pages) * 500kb (page size) = 500 TB/month
for 5 years 500 (TB/month) * 12 (months) * 5 (years) = 30 PB
Device Notifications Disney Hotstar Google Search Netflix Play API Netflix Video Stream StackOverflow Twitter (2022) Uber Live Chat