DISTRIBUTED FILE SYSTEM SUMMARY RANJANI SANKARAN Outline • • • • • • Characteristics of DFS DFS Design and Implementation Transaction and Concurrency Control Data and File Replication Current Work Future Work DFS Characteristics Dispersion • Dispersed Files Location Transparent Location Independent • Dispersed Clients login transparency access transparency Multiplicity • Multiple Files Replication Transparency • Multiple Clients Concurrency Transparency Others (general) • Fault Tolerance – crash of server or client, loss of message • Scalability – Incremental file system growth DFS STRUCTURE[3] DFS Root-Top level ; Holds links to shared folders in a Domain DFS Link- Share under the root; Link redirects to shared folder DFS Replicas or Targets- identical shares on 2 servers can be grouped together as Targets under one link. MAPPING OF LOGICAL AND PHYSICAL FOLDERS[2] DFS Design and Implementation • Problems –File Sharing and File Replication File and File Systems File name -Mapping symbolic name to a unique file id (ufid or file handle) which is the function of directory service. File attributes -ownership, type, size, timestamp, access authorization information. Data Units - Flat / Hierarchical Structure File Access - sequential, direct, indexedsequential COMPONENTS IN A FILE SYSTEM directory service name resolution, add and deletion of files authorization service capability and / or access control list file transaction concurrency and replication management basic read / write files and get / set attributes service system service device, cache, and block management Overview of FS Services DIRECTORY SERVICE – Search , Create, Delete, Rename files, mapping and locating, list a directory , traverse the file system. AUTHORIZATION SERVICE – Authorized access for security ; Read, Write , Append , Execute, Delete , List operations FILE SERVICE – Transaction Service : Basic Service : Read , Write , Open , Close , Delete ,Truncate , Seek SYSTEM SERVICE – Replication , Caching ,Mapping of addresses etc. SERVICES and SERVERS Servers/Multiple Servers implement Services; Client <->Server relation ship is relative; File Mounting • Attach a remote named file system to the client’s file system hierarchy at the position pointed to by a path name • Once files are mounted, they are accessed by using the concatenated logical path names without referencing either the remote hosts or local devices. • Location Transparent • Linked information (mount table) is kept till they are unmounted. • Different clients may perceive a different FS view – To achieve a global FS view – SA enforces mounting rules – Restrict/Allow mounting –Server’s export file. Types of Mounting – Explicit mounting: clients make explicit mounting system calls whenever one is desired – Boot mounting: a set of file servers is prescribed and all mountings are performed the client’s boot time – Auto-mounting: mounting of the servers is implicitly done on demand when a file is first opened by a client Server Registration • The mounting protocol is not transparent – the initial mounting requires knowledge of the location of file servers • Server registration – File servers register their services, and clients consult with the registration server before mounting – Clients broadcast mounting requests, and file servers respond to client’s requests Stateful and Stateless File Servers • Stateful file Server : file servers maintain state information about clients between requests • Stateless file Server : when a client sends a request to a server, the server carries out the request, sends the reply, and then remove from its internal tables all information about the request – Between requests, no client-specific information is kept on the server – Each request must be self-contained: full file name and offset… • State information could be: • Opened files and their clients • File descriptors and file handles • Current file position pointers, mounting information • Cache or buffer File Access and Semantics of Sharing • File Sharing – Overlapping access :Multiple copies of same file • Cache or replication, Space Multiplexing • Coherency Control: coherent view of shared files, managing access to replicas, atomic updates. – Interleaving access: Multiple granularities of data access operations • Time Multiplexing • Simple Read Write, Transaction, Session • Concurrency Control: Prevent erroneous /inconsistent results during concurrent access Semantics of Sharing/Replication • Unix Semantics : Currentness : Writes propagated immediately so that reads will return latest value. • Transaction Semantics: Consistency: Writes are stored and propagated when consistency constraints are met. • Session Semantics:Efficiency:Writes done on a working copy; results made permanent during session close. REPLICATION • Write Policies • Cache Coherence Control • Version Control Transaction and Concurrency Control • Concurrency Control Protocol required to maintain ACID Semantics for Concurrent transactions. • Distributed Transaction Processing System: – Transaction Manager: correct execution of local and remote transactions. – Scheduler: schedules operations to avoid conflicts using locks, timestamps and validation managers. – Object Manager: coherency of replicas/caches; interface to the file system. Transaction and Concurrency Control Serializability • A schedule is Serializable if the result of execution is equivalent to that of a serial schedule. (without cyclic hold-wait deadlock situations, holding conflicting locks etc.). • In Transactions, the transaction states must be consistent. • Conflicts – write-write: write-read: read-write on a shared object Interleaving Schedules • Sched (1 ,3) and (2,4) are trying to perform similar operations on data objects C and D. • (1,2) and (3,4) order is only valid. Concurrency Control Protocols • Two Phase Locking: – Growing Phase, Shrinking Phase – Sacrifices concurrency and sharing for Serializability – Circular wait(deadlock) to : Write A=100 ; Write B =20 t1 : Read A ,Read B 1. Write Sum in C;2.Write diff in D t2 : Read A, Read B 3. Write sum in D;3.Write diff in C Solution : Release locks as soon as possible Problem : Rolling aborts , Commit dependence Solution : Strict 2 Phase Locking Systems Time Stamp Ordering • Time Stamp Ordering – Logical timestamps or counters ,unique timestamps for Txs. – Larger TS Txs wait for smaller TS Txs;Small TS Txs die and restart when confronting larger TS Txs. – t0 ( 50 ms) < t1 (100 ms)< t2 (200 ms); t0 : write A=100 ; Write B = 20 ; ->Completed t1 : Read A ,Read B 1. Write Sum in C;2.Write diff in D t2 : Read A, Read B 3. Read Sum in C;3.Write diff in C Time Stamp Ordering Concurrency Control • RD and WR –Logical TS for last read/write • Tmin is the minimum tentative time for pending write. Optimistic Concurrency Control • Allows entire transaction to complete and then validate the transaction before making its effect permanent • Execution Phase ,Validation Phase , Update Phase • Optimistic Concurrency Control mechanism • Validation : 2 Phase Commit Protocol by sending validation request to all TMs. • Validated updates are committed in Update Phase. Data and File Replication • For Concurrent access and availability. • GOAL • One-copy Serializability: – The execution of transaction on replicated objects is equivalent to the execution of the same transactions on non-replicated objects – Read Operations : Read-one-primary, Read-one ,Readquorum – Write Operations:Write-one-primary,Write-all,Write-allavailable,Write-quorum,Write-gossip • Quorum Voting: • Gossip Update Propagation • Casual Order Gossip Protocol ARCHITECTURE • Client chooses one or more FSA to access data object. • FSA acts as front end to replica managers RMs to provide replication transparency. • FSA contacts one or more RMs for actual updating and reading of data objects. Quorum Voting/Gossip Update Propagation • Quorum Voting : Uses Read Quorum, Write Quorum – Write-write conflict: 2 * Write quorum > all object copies – Read-write conflict: Write quorum + read quorum > all object copies. • Gossip Update Propagation: – Read: if TSfsa<=TSrm, RM has recent data, return it, otherwise wait for gossip, or try other RM – Update :if Tsfsa>TSrm, update. Update TSrm send gossip. Otherwise, process based on application, perform update or reject – Gossip : update RM if gossip carries new updates. Gossip Updation Protocol • Used in Fixed RM Configuration • Uses Vector Timestamps, Uses buffer to keep order Current Work Here are some links to current distributed-file system and related projects: Ceph: http://ceph.newdream.net/ (Peta Byte Scale DFS which is Posix Compatible and fault tolerant) GlusterFS: http://www.gluster.org/ HDFS: http://hadoop.apache.org/hdfs/ HekaFS: http://www.hekafs.org/ OrangeFS: http://www.orangefs.org/ and http://www.pvfs.org/ KosmosFS: http://code.google.com/p/kosmosfs/ MogileFS: http://danga.com/mogilefs/ Swift (OpenStack Storage): http://www.openstack.org/projects/storage/ FAST'11 proceedings: http://www.usenix.org/events/fast11/tech/ Future Work • usability/scalability issues relate to the costs of traversal in Distributed File Systems as traditional model of file traversal might not be suitable for searching /indexing [3] • File Systems adding support for their own indexing (Continuous/incremental updates of indexes) • NFS family might become increasingly irrelevant for more geographically distributed enterprises. • Innovations in the area of multi tenancy and security for Distributed/Cloud Computing References 1. 2. 3. 4. 5. 6. 7. R. Chow and T. Johnson, Distributed Operating Systems & Algorithms, 1997 http://www.windowsnetworking.com/articles_tutorials/ImplementingDFS-Namespaces.html -DFS Namespaces reference http://www.quora.com/Distributed-Systems/What-is-the-future-of-filesystems -Future of File Systems http://www.cs.iit.edu/~iraicu/research/publications/2011_LSAP2011_ex ascale-storage.pdf -Issues with DFS at Exascale http://www.usenix.org/publications/login/201008/openpdfs/maltzahn.pdf - Ceph as a scalable alternative to Hadoop. http://wwwsop.inria.fr/members/Patrick.Valduriez/pmwiki/Patrick/uploads//Confer ences/dexa2011.pdf - Distributed Data Management in 2020? http://www.greenplum.com/media-center/big-data-use-cases/agileanalytics -Hadoop might become the future solution THANKS YOU