Many systems design questions are intentionally left very vague and are literally given in the form of Design Facebook. It's your job to ask clarifying questions to better understand the system that you have to build.
* IDs must be unique and sortable
* IDs should contain numeric values
* Should fit into 64 bit
* Should scale to 1ok per second
Generates Twitter-like Snowflake ids. In short, This is an id scheme to generate unique 64 bit ids which are roughly sortable across multiple systems without a central instance. The IDs are 64-bits in size and are generated with the combination of the following:
- Epoch timestamp in milliseconds precision - 41 bits
- Node ID - 10 bits. This gives us 1024 nodes/machines.
- Sequence Number - Local counter per machine - 13 bits (Theoritically can generate 2^13-1 per milli second
- https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c
- https://blog.twitter.com/engineering/en_us/a/2010/announcing-snowflake
- https://www.callicoder.com/distributed-unique-id-sequence-number-generator/
- https://github.com/phxql/snowflake-id
* Convert a long URL to short URL
* Short URL should be a combination of 0-9, a-z and A-Z
* Short URL length should be as small as possible
* Support 100 million URLs per day
* Assuming the service will run for 10 years, it needs to support 10 * 365 * 10 million = 365 Billion records
* Assuming each record is around 100 bytes, should support 365 TB of data
* To support more than 365 billion unique records with (0-9, a-z & A-Z) ie 62 unique charecters, its safe to keep the length of the shortened URL at 7 ( 62 ^ 7 = 3.5 trillion)
* Generate an unique ID for each long URL (Refer Distributed ID Generator)
* Generate a Base62 representation of this ID. Thats going to be the short URL
* Store all these 3 attributes in a database for each record
* Given a set of seed URLs, crawl all the web pages, download them. Extract the URLs amd add them to the list and continue
* Should scale out
*
* Should crawl around 1 billion pages per month
* Store data for around 5 years
* Pages with duplicate content should be avoided
* Pages to be crawled = 1000000000 (1 billion) / 24 / 3600 = 400 pages / second
* Data storage requirement
* Assuming each web page is 500 KB. Total size = 1000000000 * 500 = 500 TB per month * 12 * 5 = 3000 TB
* Support for Email, SMS & Push notifications(both IOS and Android)
* Real time
* Notifaction service should expose an interface using which client applications can send notfications
* User preference service - Users should be able to opt out