Neha Purohit Why replicate Performance Reliability Resource sharing Network resource saving Challenge Transparency Replication Concurrent control Failure recovery Serialization Atomicity In database systems, atomicity is one of the ACID transaction properties. An atomic transaction is a series of database operations which either all occur, or all do not occur[1]. All or nothing Atomicity In DFS (Distributed File System), replicated objects (data or file) should follow atomicity rules, i.e., all copies should be updated (synchronously or asynchronously) or none. Goal One-copy serializability: The effect of transactions performed by clients on replicated objects should be the same as if they had been performed one at a time on a single set of objects.[2] Architecture[3] Client Client FSA RM RM FSA RM RM Read operations [3] Read-one-primary, FSA only read from a primary RM, consistency Read-one, FSA may read from any RM, concurrency Read-quorum, FSA must read from a quorum of RMs to decide the currency of data Write Operations[3] • Write-one-primary, only write to primary RM, primary RM update all other RMs • Write-all, update to all RMs • Write-all- available, write to all functioning RMs. Faulty RM need to be synched before bring online. Write Operations Write-quorum, update to a predefined quorum of RMs Write-gossip, update to any RM and lazily propagated to other RMs Read one primary, write one primary Other RMs are backups of primary RM No concurrency Easy serialized Simple to implement Achieve one-copy serializability Primary RM is performance bottleneck Read one, Write all Provides concurrency Concurrency control protocol needed to ensure consistency (serialization) Achieve one-copy serializability Difficult to implement (there will be failed TM to block any updates) Read one, Write all available Variation of Read one, Write all May not guarantee one-copy serializability Issue of loss conflict in transactions Read quorum, Write quorum • Version number attached to replicated object • Highest version numbered object is the latest object in read. • Write operation advances version by 1 • Write quorum > half of all object copies • Write quorum+read quorum > all object copies Gossip Update Applicable for frequent read, less update situations Increased performance Typical read one, write gossip Use timestamp Basic Gossip Update Used for overwrite Three operations, read, update, gossip arrive Read, if TSfsa<=TSrm, RM has recent data, return it, otherwise wait for gossip, or try other RM Update, if Tsfsa>TSrm, update. Update TSrm send gossip. Otherwise, process based on application, perform update or reject Gossip arrive, update RM if gossip carries new updates. Causal Order Gossip Protocol[3] Used for read-modify In a fixed RM configuration Using vector timestamps Using buffer to keep the order Windows Server 2008[4] • Support DFS • “State based, multi master” scheduled replication • Use namespace for transparent file sharing • Use Remote Differential Compression to propagate change only to save bandwidth FUTURE WORK IN FILE REPLICATION DISTRIBUTED FILE SYSTEMS: STATE OF THE ART • GFS: Google File System –Google –C/C++ • HDFS: Hadoop Distributed File System –Yahoo –Java, Open Source • Sector: Distributed Storage System –University of Illinois at Chicago –C++, Open Source FILE SYSTEMS OVERVIEW • System that permanently stores data • Usually layered on top of a lower-level physical storage medium • Divided into logical units called “files” –Addressable by a filename (“foo.txt”) –Usually supports hierarchical nesting (directories) • A file path joins file & directory names into a relative or absolute address to identify a file (“/home/aaron/foo.txt”) SHARED/PARALLEL/DISTRIBUTED FILE SYSTEMS • Support access to files on remote servers • Must support concurrency –Make varying guarantees about locking, who “wins” with concurrent writes, etc... –Must gracefully handle dropped connections • Can offer support for replication and local caching • Different implementations sit in different places on complexity/feature scale SECTOR AND SPHERE Sector: Distributed Storage System Sphere: Run-time middleware that supports simplified distributed data processing. Open source software, GPL, written in C++. Started since 2006, current version 1.18 http://sector.sf.net Sector –Brief Definition Sector is an open source data cloud model. Its Assumptions: Presence of a high bandwidth data link among the racks in a Data Center and also among different Data Centers. Also, that individual applications may have to process large streams of data and produce equally large output streams. SECTOR: DISTRIBUTED STORAGE SYSTEMS SECTOR: DISTRIBUTED STORAGE SYSTEMS • Sector stores files on the native/local file system of each slave node. • Sector does not split files into blocks –Pro: simple/robust, suitable for wide area –Con: file size limit • Sector uses replications for better reliability and availability • The master node maintains the file system metadata. No permanent metadata is needed. • Topology aware SECTOR:WRITE/READ •Write is exclusive •Replicas are updated in a chained manner: the client updates one replica, and then this replica updates another, and so on. All replicas are updated upon the completion of a Write operation. •Read: different replicas can serve different clients at the same time. Nearest replica to the client is chosen whenever possible. SECTOR: TOOLS AND API •Supported file system operation: ls, stat, mv, cp, mkdir, rm, upload, download –Wild card characters supported •System monitoring: sysinfo. •C++ API: list, stat, move, copy, mkdir, remove, open, close, read, write, sysinfo. Sector: Architecture Sector manages its data with the help of the following components. Security Server : Authenticates the clients and the slave servers Master Server : Contains File Meta Data, Schedules the work among the slave servers. Slave Servers: Contains datasets divided amongst them in the form of Linux files. Sphere: A Brief Definition Sphere is a programming model built over the Sector architecture of data cloud. It falls under the Single instruction Multiple Data Category from Flynn’s taxonomy. Sphere : Computing Model Sphere identifies the individual records in a file with the help of index files. Each record or a bunch of records are treated as independent data entities that can be processed in Parallel by different slave nodes. Similar to data parallelism. These slave nodes are managed by Sphere Engines(SPE) and the SPE are scheduled by Sphere. Multiple stages of SPE can be coupled together. Processing MAP REDUCE Similarity to Map Reduce Sphere model is quite similar to Map Reduce because both of them deal with data parallelism and allow coupling of at least two stages of processing elements. Map Reduce shuffles the data among the slave nodes in the second stage whereas Sphere allows the output of the first stage to be distributed among different processing nodes of the second stage. References [1] Wikipedia; http://en.wikipedia.org/wiki/Atomicity [2] M. T. Harandi;J. Hou (modified: I. Gupta);"Transactions with Replication";http://www.crhc.uiuc.edu/~nhv/428/slides/repl-trans.ppt [3] Randy Chow,Theodore Johnson, “Distributed Operating Systems & Algorithms”, 1998 [4] "Overview of the Distributed File System Solution in Microsoft Windows Server 2003 R2";http://technet2.microsoft.com/WindowsServer/en/library/d3afe6ee3083-4950-a093-8ab748651b761033.mspx?mfr=true [5] "Distributed File System Replication: Frequently Asked Questions";http://technet2.microsoft.com/WindowsServer/en/library/f9b9 8a0f-c1ae-4a9f-9724-80c679596e6b1033.mspx?mfr=true [6] http://code.google.com/edu/parallel/dsd-tutorial.html [7] http://code.google.com/edu/parallel/mapreducetutorial.html [8]http://static.googleusercontent.com/external_content/untru sted_dlcp/labs.google.com/en/us/papers/gfs-sosp2003.pdf [9]http://arxiv.org/ftp/arxiv/papers/0809/0809.1181.pdf