DFS Media Service Design Documentation
This is one of the first Ideation documentation, where we will cover the client side of the product how this is going to look for the client and how we can make it easy to use for developers.
INTRODUCTION
The DFS is designed to provide a scalable, fault-tolerant, and efficient solution for storing and retrieving large files. It leverages modern technologies like REST APIs, gRPC, Kafka, and a distributed architecture to handle high volumes of data and ensure robust performance.
SYSTEM OVERVIEW
Key features:
High scalability to manage growing storage needs.
Fault tolerance with temporary storage and reliable queuing mechanisms.
Efficient Node balancing for optimal resource utilization.
Immediate file availability via temporary storage.
Robust security with encryption and authentication
Easy to upload with REST integration from the client side.
System Component
User-facing SDK for client-side
Node balancer and Node mamager
Rest API interface
Ingress/Retrieval Nodes
Kafka Queue and Zookeeper
Storage Unit nodes
Replication Nodes
Metadata management system on a centralized database
Redis for caching and managing temporary data storage
User Component
1. Node Manager :
Manage the ingress nodes and act as load balances for the Ingress and retrieval Nodes. It is mainly used for monitoring ingress nodes for available storage and current load and ensures efficient distribution of file uploads.
Routes file uploads to the least loaded ingress node.
2. Rest API Layer :
Acts as the gateway for user interactions with the DFS, Receives file upload requests, and routes them to appropriate ingress nodes. Provides endpoints for file retrieval and status checking.
Sends files to ingress nodes for temporary storage.
/upload
: Accepts files and metadata./retrieve
: Fetches files based on user requests./status
: Checks the status of file distribution
3. Ingress Node :
Ingress Node is the place where the file is stored temporarily and then the queue gets triggered to distribute the chunks to multiple storage nodes.
Store files temporarily until they are processed and distributed.
Serve files directly to users before distribution is complete.
Communicate with the Node Manager to receive files.
Sends files to Kafka for processing.
Deletes temporary files after distribution.
4. Kafka Queue :
Handles file processing, chunking, and distribution. Reliable queuing mechanism with retry policies. Parallel task execution for chunking and gRPC distribution.
Receives file processing tasks from ingress nodes.
Distributes tasks to storage nodes via gRPC.
5. Storage Nodes :
Permanent storage units for file chunks. Store chunks of files based on hashing and distribution logic. Serve chunks during file retrieval.
Receive file chunks from Kafka-triggered processes via gRPC.
Reconstruct files for user retrieval.
6. MetaData Storage :
Centralized Postgres database for metadata management. Tracks file and chunk metadata (e.g., size, type, upload time). Maintains chunk locations and file statuses.
Used by all components for consistent metadata updates and retrieval.
7. SDK :
Provides a client-facing interface for file uploads and retrievals. Available as JavaScript and backend language SDKs. Abstracts complexities of gRPC and REST API interaction for the user.
Communicates with the REST API to upload files.
Fetches files using REST API endpoints.
Work Flow :
File Upload
The user uploads a file via the SDK.
The REST API forwards the file to the Node Manager.
The Node Manager assigns the file to an ingress node.
The ingress node temporarily stores the file and generates metadata.
Kafka triggers chunking and distribution to storage nodes.
Temporary files are deleted after successful distribution.
File Retrieval
The user requests a file via the SDK.
REST API checks metadata to determine the file's status:
Temporary Storage: The file is served directly from the ingress node.
Distributed: The file is reconstructed from chunks in storage nodes and served.
This DFS design balances performance, scalability, and user experience. It ensures data availability, integrity, and security while being capable of handling high data volumes. For implementation, the focus should be on:
Efficient integration of components.
Optimization of chunking and distribution processes.
Monitoring and proactive scaling.
Last updated