Google File System for Developers

The Google File System is designed for large files and streaming reads/writes on commodity hardware. It uses a single master and multiple chunk servers to store and replicate file chunks. The master manages metadata and chunk placement, while clients read/write chunks stored on chunk servers. It provides high throughput, fault tolerance, and consistency through techniques like chunk replication, logging, leases, and checksum verification.

Uploaded by

Rakesh Akhileswaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

156 views28 pages

Google File System for Developers

Uploaded by

Rakesh Akhileswaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

The Google File System

Tut Chi Io
Design Overview – Assumption
• Inexpensive commodity hardware
• Large files: Multi-GB
• Workloads
– Large streaming reads
– Small random reads
– Large, sequential appends
• Concurrent append to the same file
• High Throughput > Low Latency
Design Overview – Interface
• Create
• Delete
• Open
• Close
• Read
• Write
• Snapshot
• Record Append
Design Overview – Architecture
• Single master, multiple chunk servers,
multiple clients
– User-level process running on commodity
Linux machine
– GFS client code linked into each client
application to communicate
• File -> 64MB chunks -> Linux files
– on local disks of chunk servers
– replicated on multiple chunk servers (3r)
• Cache metadata but not chunk on
clients
Design Overview – Single Master
• Why centralization? Simplicity!
• Global knowledge is needed for
– Chunk placement
– Replication decisions
Design Overview – Chunk Size
• 64MB – Much Larger than ordinary, why?
– Advantages
• Reduce client-master interaction
• Reduce network overhead
• Reduce the size of the metadata
– Disadvantages
• Internal fragmentation
– Solution: lazy space allocation
• Hot Spots – many clients accessing a 1-chunk file,
e.g. executables
– Solution:
– Higher replication factor
– Stagger application start times
– Client-to-client communication
Design Overview – Metadata
• File & chunk namespaces
– In master’s memory
– In master’s and chunk servers’ storage
• File-chunk mapping
– In master’s memory
– In master’s and chunk servers’ storage
• Location of chunk replicas
– In master’s memory
– Ask chunk servers when
• Master starts
• Chunk server joins the cluster
– If persistent, master and chunk servers must be in
sync
Design Overview – Metadata – In-memory DS
• Why in-memory data structure for the
master?
– Fast! For GC and LB
• Will it pose a limit on the number of
chunks -> total capacity?
– No, a 64MB chunk needs less than 64B
metadata (640TB needs less than 640MB)
• Most chunks are full
• Prefix compression on file names
Design Overview – Metadata – Log
• The only persistent record of metadata
• Defines the order of concurrent
operations
• Critical
– Replicated on multiple remote machines
– Respond to client only when log locally and
remotely
• Fast recovery by using checkpoints
– Use a compact B-tree like form directly
mapping into memory
– Switch to a new log, Create new
checkpoints in a separate threads
Design Overview – Consistency Model
• Consistent
– All clients will see the same data, regardless
of which replicas they read from
• Defined
– Consistent, and clients will see what the
mutation writes in its entirety
Design Overview – Consistency Model
• After a sequence of success, a region is
guaranteed to be defined
– Same order on all replicas
– Chunk version number to detect stale
replicas
• Client cache stale chunk locations?
– Limited by cache entry’s timeout
– Most files are append-only
• A Stale replica return a premature end of chunk
System Interactions – Lease
• Minimized management overhead
• Granted by the master to one of the
replicas to become the primary
• Primary picks a serial order of mutation
and all replicas follow
• 60 seconds timeout, can be extended
• Can be revoked
System Interactions – Mutation Order
Current lease holder?
Write request

3a. data identity of primary

location of replicas
(cached by client)

Operation completed Operation completed

or Error report 3b. data
Primary assign s/n to mutations
Applies it
Forward write request

3c. data

Operation completed
System Interactions – Data Flow
• Decouple data flow and control flow
• Control flow
– Master -> Primary -> Secondaries
• Data flow
– Carefully picked chain of chunk servers
• Forward to the closest first
• Distances estimated from IP addresses
– Linear (not tree), to fully utilize outbound
bandwidth (not divided among recipients)
– Pipelining, to exploit full-duplex links
• Time to transfer B bytes to R replicas = B/T + RL
• T: network throughput, L: latency
System Interactions – Atomic Record Append
• Concurrent appends are serializable
– Client specifies only data
– GFS appends at least once atomically
– Return the offset to the client
– Heavily used by Google to use files as
• multiple-producer/single-consumer queues
• Merged results from many different clients
– On failures, the client retries the operation
– Data are defined, intervening regions are
inconsistent
• A Reader can identify and discard extra padding
and record fragments using the checksums
System Interactions – Snapshot
• Makes a copy of a file or a directory tree
almost instantaneously
• Use copy-on-write
• Steps
– Revokes lease
– Logs operations to disk
– Duplicates metadata, pointing to the same
chunks
• Create real duplicate locally
– Disks are 3 times as fast as 100 Mb Ethernet
links
Master Operation – Namespace Management
• No per-directory data structure
• No support for alias
• Lock over regions of namespace to
ensure serialization
• Lookup table mapping full pathnames to
metadata
– Prefix compression -> In-Memory
Master Operation – Namespace Locking
• Each node (file/directory) has a read-
write lock
• Scenario: prevent /home/user/foo from
being created while /home/user is
being snapshotted to /save/user
– Snapshot
• Read locks on /home, /save
• Write locks on /home/user, /save/user
– Create
• Read locks on /home, /home/user
• Write lock on /home/user/foo
Master Operation – Policies
• New chunks creation policy
– New replicas on below-average disk
utilization
– Limit # of “recent” creations on each chun
server
– Spread replicas of a chunk across racks
• Re-replication priority
– Far from replication goal first
– Chunk that is blocking client first
– Live files first (rather than deleted)
• Rebalance replicas periodically
Master Operation – Garbage Collection
• Lazy reclamation
– Logs deletion immediately
– Rename to a hidden name
• Remove 3 days later
• Undelete by renaming back
• Regular scan for orphaned chunks
– Not garbage:
• All references to chunks: file-chunk mapping
• All chunk replicas: Linux files under designated directory
on each chunk server
– Erase metadata
– HeartBeat message to tell chunk servers to delete
chunks
Master Operation – Garbage Collection
• Advantages
– Simple & reliable
• Chunk creation may failed
• Deletion messages may be lost
– Uniform and dependable way to clean up
unuseful replicas
– Done in batches and the cost is amortized
– Done when the master is relatively free
– Safety net against accidental, irreversible
deletion
Master Operation – Garbage Collection
• Disadvantage
– Hard to fine tune when storage is tight
• Solution
– Delete twice explicitly -> expedite storage
reclamation
– Different policies for different parts of the
namespace

• Stale Replica Detection

– Master maintains a chunk version number
Fault Tolerance – High Availability
• Fast Recovery
– Restore state and start in seconds
– Do not distinguish normal and abnormal
termination
• Chunk Replication
– Different replication levels for different
parts of the file namespace
– Keep each chunk fully replicated as chunk
servers go offline or detect corrupted
replicas through checksum verification
Fault Tolerance – High Availability
• Master Replication
– Log & checkpoints are replicated
– Master failures?
• Monitoring infrastructure outside GFS starts a
new master process
– “Shadow” masters
• Read-only access to the file system when the
primary master is down
• Enhance read availability
• Reads a replica of the growing operation log
Fault Tolerance – Data Integrity
• Use checksums to detect data corruption
• A chunk(64MB) is broken up into 64KB blocks
with 32-bit checksum
• Chunk server verifies the checksum before
returning, no error propagation
• Record append
– Incrementally update the checksum for the last
block, error will be detected when read
• Random write
– Read and verify the first and last block first
– Perform write, compute new checksums
Conclusion
• GFS supports large-scale data
processing using commodity hardware
• Reexamine traditional file system
assumption
– based on application workload and
technological environment
– Treat component failures as the norm
rather than the exception
– Optimize for huge files that are mostly
appended
– Relax the stand file system interface
Conclusion
• Fault tolerance
– Constant monitoring
– Replicating crucial data
– Fast and automatic recovery
– Checksumming to detect data corruption at
the disk or IDE subsystem level
• High aggregate throughput
– Decouple control and data transfer
– Minimize operations by large chunk size and
by chunk lease
Reference
• Sanjay Ghemawat, Howard Gobioff, and
Shun-Tak Leung, “The Google File
System”

The Google File System
No ratings yet
The Google File System
21 pages
Google File System Architecture Overview
No ratings yet
Google File System Architecture Overview
40 pages
2 GFS
No ratings yet
2 GFS
30 pages
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
No ratings yet
The Google File System: S. Ghemawat, H. Gobioff, and S. T. Leung. SOSP 2003
33 pages
M4 - 05 - Google File System
No ratings yet
M4 - 05 - Google File System
28 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
40 pages
The Google File System: Firas Abuzaid
No ratings yet
The Google File System: Firas Abuzaid
22 pages
The File System: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
No ratings yet
The File System: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
31 pages
Google File System
No ratings yet
Google File System
22 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
Case Study: Google File System
No ratings yet
Case Study: Google File System
7 pages
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
No ratings yet
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
52 pages
Paper Gfs Summary
No ratings yet
Paper Gfs Summary
14 pages
Unit 5 Lecture 2
No ratings yet
Unit 5 Lecture 2
22 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
20 pages
GFD Summary
No ratings yet
GFD Summary
3 pages
Unit 2
No ratings yet
Unit 2
22 pages
R16 4-1 BDA - Unit-2 (Ref-3)
No ratings yet
R16 4-1 BDA - Unit-2 (Ref-3)
22 pages
Distributed File System Google File System
No ratings yet
Distributed File System Google File System
44 pages
Chapter 2 Google File System 250525 070947
No ratings yet
Chapter 2 Google File System 250525 070947
42 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
20 pages
Google File System Review 2016
No ratings yet
Google File System Review 2016
4 pages
Google File System Overview
No ratings yet
Google File System Overview
9 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
Google File System Overview
No ratings yet
Google File System Overview
9 pages
Google File System Overview and Analysis
No ratings yet
Google File System Overview and Analysis
35 pages
Google File System & Hadoop Overview
No ratings yet
Google File System & Hadoop Overview
22 pages
Cloud Application Requirements Overview
No ratings yet
Cloud Application Requirements Overview
21 pages
Lecture 14 HDFS GFS
No ratings yet
Lecture 14 HDFS GFS
30 pages
Ds 2016 17 Lec18
No ratings yet
Ds 2016 17 Lec18
26 pages
Google File System 1
No ratings yet
Google File System 1
48 pages
Chapter 2 1712934164766
No ratings yet
Chapter 2 1712934164766
21 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
9238 DC Assignment 3
No ratings yet
9238 DC Assignment 3
5 pages
Google File System
No ratings yet
Google File System
48 pages
Bda Material Unit 2
No ratings yet
Bda Material Unit 2
19 pages
MIT 6.824 - Lecture 3 - GFS
No ratings yet
MIT 6.824 - Lecture 3 - GFS
1 page
05 en Distributed File Systems
No ratings yet
05 en Distributed File Systems
63 pages
Introduction to Distributed Data Processing
No ratings yet
Introduction to Distributed Data Processing
2 pages
Chap 6
No ratings yet
Chap 6
54 pages
Distributed File System Study
No ratings yet
Distributed File System Study
4 pages
BDA Unit I
No ratings yet
BDA Unit I
18 pages
Overview of Google File System (GFS)
No ratings yet
Overview of Google File System (GFS)
5 pages
Big Data: Google File System & HDFS Guide
100% (1)
Big Data: Google File System & HDFS Guide
88 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
Gfs
No ratings yet
Gfs
15 pages
Questions On Google File System
100% (1)
Questions On Google File System
3 pages
GFS - Architecture M5 GFS - Architecture M5
No ratings yet
GFS - Architecture M5 GFS - Architecture M5
25 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
HDFS Architecture and Data Management
No ratings yet
HDFS Architecture and Data Management
19 pages
Google File System Insights
50% (2)
Google File System Insights
4 pages
Google File System Overview
No ratings yet
Google File System Overview
18 pages
The Google File System: 1. Abstract
No ratings yet
The Google File System: 1. Abstract
9 pages
Distributed Computing Module 5 Important Topics PYQs
No ratings yet
Distributed Computing Module 5 Important Topics PYQs
23 pages
Research Proposal For Consumer Satisfaction of Beverage: The Case of Transcom Beverage Limited
No ratings yet
Research Proposal For Consumer Satisfaction of Beverage: The Case of Transcom Beverage Limited
6 pages
1 - HADOOP Crash Course
No ratings yet
1 - HADOOP Crash Course
52 pages
Possible Qs
No ratings yet
Possible Qs
9 pages
Power BI Developer Profile and Skills
No ratings yet
Power BI Developer Profile and Skills
1 page
Sinumerik SINUMERIK 840D sl/828D/802D SL ISO Dialects
No ratings yet
Sinumerik SINUMERIK 840D sl/828D/802D SL ISO Dialects
202 pages
Veritas Volume Manager Command Guide
No ratings yet
Veritas Volume Manager Command Guide
8 pages
BUMA 30093 Customer Analytics Guide
No ratings yet
BUMA 30093 Customer Analytics Guide
96 pages
SAP Custom Object Status Tracker
No ratings yet
SAP Custom Object Status Tracker
47 pages
Baby Thesis Social Media Impact
No ratings yet
Baby Thesis Social Media Impact
3 pages
Unit II Notes
No ratings yet
Unit II Notes
25 pages
IT Job Listings for Professionals
No ratings yet
IT Job Listings for Professionals
77 pages
Arunkumar Reddy - ETL Tester Resume - 6 Yrs Exp
No ratings yet
Arunkumar Reddy - ETL Tester Resume - 6 Yrs Exp
3 pages
Literature Review On Data Collection and Analysis
100% (2)
Literature Review On Data Collection and Analysis
6 pages
Kv2 Computer Science
No ratings yet
Kv2 Computer Science
184 pages
Create A Table Called 'EMPLOYEE' With The Following Structure
No ratings yet
Create A Table Called 'EMPLOYEE' With The Following Structure
9 pages
Dhrumil Patel: Data Analyst Resume
No ratings yet
Dhrumil Patel: Data Analyst Resume
3 pages
This Article Will Provide You Brief About The Tables in ServiceNow
No ratings yet
This Article Will Provide You Brief About The Tables in ServiceNow
10 pages
Secondary vs. Primary Data Collection
No ratings yet
Secondary vs. Primary Data Collection
10 pages
Correlation of Earphone Use and Tinnitus
No ratings yet
Correlation of Earphone Use and Tinnitus
8 pages
Digitalization and The Challenges For The Accounting Profession
No ratings yet
Digitalization and The Challenges For The Accounting Profession
10 pages
CT 2014a
No ratings yet
CT 2014a
352 pages
Database Systems Exam 2017 Analysis
No ratings yet
Database Systems Exam 2017 Analysis
7 pages
CV - (Assignment - W1)
No ratings yet
CV - (Assignment - W1)
2 pages
A Systematic Literature Review of Methods and Datasets For Anomaly Based Network Intrusion Detection
No ratings yet
A Systematic Literature Review of Methods and Datasets For Anomaly Based Network Intrusion Detection
20 pages
3.1 Data Flow Diagram:: Level 1
No ratings yet
3.1 Data Flow Diagram:: Level 1
10 pages
CR Json
No ratings yet
CR Json
84 pages
MySQL Database Management Lab
No ratings yet
MySQL Database Management Lab
3 pages
Executive Product Development Expert
No ratings yet
Executive Product Development Expert
3 pages
Data Warehousing for Analysts
No ratings yet
Data Warehousing for Analysts
48 pages
How Bank of China Uses A Scale-Out Database To Support Zabbix Monitoring at Scale - PingCAP
No ratings yet
How Bank of China Uses A Scale-Out Database To Support Zabbix Monitoring at Scale - PingCAP
9 pages

Google File System for Developers

Uploaded by

Google File System for Developers

Uploaded by

The Google File System

3a. data identity of primary

Operation completed Operation completed

• Stale Replica Detection

You might also like