Google File System

Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗ Overview  NFS  Introduction-Design Overview  Architecture  System Interactions  Master Operations  Fault tolerance  Conclusion NFS  Is build RPC’s  Low performance  Security Issues Introduction Need For GFS:  Large Data Files  Scalability  Reliability  Automation  Replication of data  Fault Tolerance Design Overview: Assumptions:  Component’s Monitoring  Storing of huge data  Reading and writing of data  Well defined semantics for multiple clients  Importance of Bandwidth Interface:  Not POSIX compliant  Additional operations o Snapshot o Record append Architecture: Cluster Computing  Single Master  Multiple Chunk Servers  Stores 64 bit file chunks  Multiple clients Single Master , Chunk size & Meta data Single Master: Minimal Master Load.  Fixed chunk Size.  The master also predicatively provide chunk locations immediately following those requested by unique id.  Chunk Size : 64 MB size.  Read and write operations on same chunk.  Reduces network overhead and size of metadata in the master.  Metadata :   Types of Metadata: o File and chunk namespaces o Mapping from files to chunks o Location of each chunks replicas In-memory data structures: o Master operations are fast. o Periodic scanning entire state is easy and efficient   Chunk Locations: o Master polls chunk server for the information. o Client request data from chunk server. Operation Log: o Keeps track of activities. o It is central to GFS. o It stores on multiple remote locations. System Interactions:  Leases And Mutation order: o Leases maintain consistent mutation order across the replicas. o Master picks one replica as primary. o Primary defines serial order for mutations. o Replicas follow same serial order. o Minimize management overhead at the master.  Atomic Record Appends: o GFS offers Record Append . o Clients on diﬀerent machines append to the same ﬁle concurrently. o The data is written at least once as an atomic unit.  Snapshot: o It creates quick copy of files or a directory . o Master revokes lease for that file o Duplicate metadata o On first write to a chunk after the snapshot operation o All chunk servers create new chunk o Data can be copied locally Master Operation  Namespace Management and Locking: o GFS maps full pathname to Metadata in a table. o Each master operation acquires a set of locks. o Locking scheme allows concurrent mutations in same directory. o Locks are acquired in a consistent total order to prevent deadlock.  Replica Placement: o Maximizes reliability, availability and network bandwidth utilization. o Spread chunk replicas across racks Creation, Re-replication, Rebalancing  Create: o Equalize disk utilization. o Limit the number of creation on chunk server. o Spread replicas across racks.  Re-replication: o Re-replication of chunk happens on priority.  Rebalancing: o Move replica for better disk space and load balancing. Remove replicas on chunk servers with below average free space. o  Garbage Collection: o Makes system Simpler and more reliable. o Master logs the deletion, renames the file to a hidden name.  Stale Replica detection: o Chunk version number identifies the stale replicas. o Client or chunk server verifies the version number. Fault Tolerance  High availability: o Fast recovery. o Chunk replication. o Shadow Masters.  Data Integrity: o Check sum every 64 kb block in each chunk. Conclusion GFS meets Google storage requirements:  Incremental growth  Regular check of component failure  Data optimization from special operations  Simple architecture  Fault Tolerance

Google File System

Related documents

Products

Support

Google File System

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib