Chapter 6.5 Distributed File Systems Summary Junfei Wen Fall 2013 Outline • • • • • • 6.1 Characteristics of DFS 6.2 DFS Design and Implementation 6.3 Transaction and Concurrency Control 6.4 Data and File Replication Current Work Future Work 6.1Characteristics of DFS Dispersion • • Dispersed Files Location Transparent Location Independent Dispersed Clients login transparency access transparency Multiplicity • • Multiple Files Replication Transparency Multiple Clients Concurrency Transparency Others (general) • Fault Tolerance – crash of server or client, loss of message • Scalability – Incremental file system growth • Efficient 6.2DFS Design and Implementation • Hierarchy files structure • File mounting protocol – Explicit mounting, manual – Boot mounting, boot up time mounting – Auto mounting, mounting at use • Distribute state information between server and clients. Stateless or stateful server • File access – Space multiplexing, multiple copy of file • Remote access • Cache access • Download/upload access – Time multiplexing, concurrent control. Same file different time • Simple RW • Transaction • session • File sharing semantics • Unix, update propagated immediately • Session, delayed update • Transaction, delayed update COMPONENTS IN A FILE SYSTEM 6.3Transaction and Concurrency Control • Distribdute Transaction Processing System: – Transaction Manager: correct execution of local and remote transactions. – Scheduler: schedules operations to avoid conflicts using locks, timestamps and validation managers. – Object Manager: coherency of replicas/caches; interface to the file system. • Serializability: A schedule is Serializable if the result of execution is equivalent to that of a serial schedule. • Concurrency Control Protocol required to maintain ACID Semantics for Concurrent transactions. – Two phase locking – Timestamp ordering – Optimistic 6.4Data and File Replication • Architecture • Client chooses one / more FSA to access data object. • FSA acts as front end to replica managers RMs to provide replication transparency. • FSA contacts one or more RMs for actual updating and reading of data objects. • One-copy Serializability: – The execution of transaction on replicated objects is equivalent to the execution of the same transactions on non-replicated objects – Read Operations : Read-one-primary, Read-one ,Read-quorum – Write Operations:Write-one-primary,Writeall,Write-all-available,Write-quorum,Write-gossip • Quorum Voting : Uses Read Quorum, Write Quorum – Write-write conflict: 2 * Write quorum > all object copies – Read-write conflict: Write quorum + read quorum > all object copies. • Gossip Update Propagation: – Read: if TSfsa<=TSrm, RM has recent data, return it, otherwise wait for gossip, or try other RM – Update :if Tsfsa>TSrm, update. Update TSrm send gossip. Otherwise, process based on application, perform update or reject – Gossip : update RM if gossip carries new updates. Current work • A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Service • Intensive Workload Consolidation for the Hadoop Distributed File Systems • An integrated high-performance distributed file system implementation on existing local network • A Cost-Effective File Lookup Service in a Distributed Metadata File System • The Mobile Agent-based Distributed Network File system Future work • Innovations in the area of security for Distributed/Cloud Computing • Improve efficiency of Parallel/Distributed system Concurrency control protocol • Improve Efficiency and Effectiveness of file replication scheme • Integrate File Replication and Consistency Maintenance Reference [1]Distributed Operating Systems and Algorithm Analysis, Andy Chow & Theodore Johnson,1997 [2] “Optimizing File Replication over Limited-Bandwidth Networks using Remote Differential Compression” IEEE Infocom Conference, 2006. [3] Transaction Management and Concurrency control by Connolly & Begg. Chapter 19. Third edition [4] "Distributed File System Replication: Frequently Asked Questions";http://technet2.microsoft.com/WindowsSe rver/en/library/f9b98a0f-c1ae-4a9f-972480c679596e6b1033.mspx?mfr=true [5]http://blogs.cs.st-andrews.ac.uk/angus/2009/09/ [6]http://www.quora.com/Distributed-Systems/What-isthe-future-of-file-systems -Future of File Systems Thank you!