Skip to content

Releases: zzhtttsss/catdfs

v1.0.0

09 Nov 08:44
439dd75
Compare
Choose a tag to compare

CatDFS

CatDFS is an open-source distributed file system implemented in Golang. It references the design of 《The Google File System》and HDFS.

CatDFS includes four subprojects:

  • master: master, the brain of the system. It is responsible for managing chunkservers and metadata, similar to the NameNode in HDFS.
  • chunkserver: chunkserver, the storage node of the system. It is responsible for storing files, similar to the DataNode in HDFS.
  • client: client, the user interacts with the file system through it.
  • base: base, contains common methods, constants and protocol files for each subproject, and each subproject depends on it.

As a distributed file system, CatDFS mainly has the following features:

  • Abundant File operations——Include upload file (add), download file (get), move file (move), delete file (remove), get file information(stat), print directory (list), rename file (rename), and future append write (append) will be supported.
  • High reliability——Files are stored in different chunkservers with multiple replica placement strategies, and the number of replicas can be adjusted as a parameter.
  • High availability——Master can be deployed in clusters for avoiding a single point of failure, and the raft algorithm is used to ensure metadata consistency. As long as more than half of the master nodes are available, the system can still work.
  • Shrinkage management——When a chunkserver crashes, the system will perform a shrinkage operation, and transfer the files stored on the crashed chunkserver to other chunkservers according to the shrinkage strategy to ensure that no copies are lost.
  • Expansion management——Users can add chunkservers at any time, and the system will transfer files to otherchunkservers according to the expansion strategy.
  • Load balance——When the user uploads files or the system shrinks and expands, the system will find the optimal strategy to select the appropriate chunkserver to place the file, so that the disk usage of each chunkserver is basically balanced.
  • Crash recovery——Both master and chunkserver can be directly restarted and added to the system without configuration after crashing, and the information(metadata or chunks) stored on them will not be lost.
  • System monitoring——Use Cadvisor+Prometheus+Grafana to visually monitor various indicators of the system.

As a project suitable for noobs, CatDFS mainly has the following characteristics:

  • Complete functional features——It implements most of the functions and features required by a distributed file system, which is helpful to understand and learn the distributed system and related dependent components.
  • Simple system architecture——Use the simplest structure to build the system so that people can learn it quickly.
  • Clear design ideas——provide complete design documents, including the design of various metadata and mechanisms, so as to quickly master the design principles of the system.
  • Detailed comments——Most functions and attributes have detailed comments to help you understand the functions and attributes.

CatDFS是一个使用Golang实现轻量级的开源分布式文件系统。
它参考了《The Google File System》以及HDFS的设计并进行了改进和取舍。

此项目包含四个子项目:

  • master:master项目,系统的逻辑中心,负责管理chunkserver和元数据,类似于HDFS中的NameNode。
  • chunkserver:chunkserver项目,系统的存储节点。负责存储文件,类似于HDFS中的DataNode。
  • client:客户端项目,用户通过它于文件系统进行交互。
  • base:基石项目,包含各个子项目通用的方法,常量以及protocol部分,各个子项目均依赖于它。

作为一个分布式文件系统,CatDFS主要具备以下特点:

  • 文件操作——上传文件(add),下载文件(get),移动文件(move),删除文件(remove),获取文件信息(stat),打印目录(list),重命名(rename),未来还将会支持追加写入(append)。
  • 高可靠性——文件以多副本的放置策略存储于不同的chunkserver中,副本数可以作为参数调整。
  • 高可用性——存储元数据的master多节点部署,并采用raft分布式共识算法保证元数据一致性。只要master节点可用数量超过一半,系统就仍能正常运作,不存在单点故障。
  • 缩容管理——当chunkserver故障时,系统会执行缩容操作,将数据节点上存储的文件根据策略转移至其他chunkserver上,确保不会丢失副本。
  • 扩容管理——用户可以随时新增chunkserver,系统会根据策略将其他chunkserver上的文件转移过来。
  • 负载均衡——在用户上传文件,系统缩容和扩容时,系统会寻找最优策略选取恰当的chunkserver放置文件,使各个chunkserver的磁盘使用量基本均衡。
  • 崩溃恢复——master节点和chunkserver节点崩溃后重启都可以无需配置直接加入系统,其上存储的信息也都不会丢失。
  • 系统监控——采用Cadvisor+Prometheus+Grafana对系统的各项运行指标和负载状况进行可视化监控。

作为一个适合新人入门的项目,CatDFS主要具备以下特点:

  • 完备的功能特性——实现了一个分布式文件系统所需要的大部分功能和特性,有助于了解和学习分布式系统及相关依赖组件。
  • 简单的系统架构——采用尽可能简洁的结构构建系统,尽可能做减法而不是做加法。
  • 清晰的设计思路——提供完整的设计文档,包含了各个元数据和机制的设计,便于快速掌握系统的设计原理。
  • 详细的代码注释——绝大多数函数和属性都有较为详尽的英文注释,帮助理解各个函数和变量的作用。