Lecture for Chapter 6.4 (Fall 11)

advertisement
Neha Purohit
Why replicate
 Performance
 Reliability
 Resource sharing
 Network resource saving
Challenge
Transparency
Replication
Concurrent control
Failure recovery
Serialization
Atomicity
 In database systems, atomicity is one of the
ACID transaction properties. An atomic
transaction is a series of database operations
which either all occur, or all do not occur[1].
 All or nothing
Atomicity
 In DFS (Distributed File System), replicated
objects (data or file) should follow atomicity
rules, i.e., all copies should be updated
(synchronously or asynchronously) or none.
Goal
One-copy serializability:
The effect of transactions performed by clients
on replicated objects should be the same as if
they had been performed one at a time on a
single set of objects.[2]
Architecture[3]
Client
Client
FSA
RM
RM
FSA
RM
RM
Read operations [3]
 Read-one-primary, FSA only read from a
primary RM, consistency
 Read-one, FSA may read from any RM,
concurrency
 Read-quorum, FSA must read from a quorum
of RMs to decide the currency of data
Write Operations[3]
• Write-one-primary, only write to primary RM,
primary RM update all other RMs
• Write-all, update to all RMs
• Write-all- available, write to all functioning
RMs. Faulty RM need to be synched before
bring online.
Write Operations
Write-quorum, update to a predefined
quorum of RMs
Write-gossip, update to any RM and
lazily propagated to other RMs
Read one primary, write one primary
 Other RMs are backups of primary RM
 No concurrency
 Easy serialized
 Simple to implement
 Achieve one-copy serializability
 Primary RM is performance bottleneck
Read one, Write all
 Provides concurrency
 Concurrency control protocol needed to
ensure consistency (serialization)
 Achieve one-copy serializability
 Difficult to implement (there will be failed TM
to block any updates)
Read one, Write all available
 Variation of Read one, Write all
 May not guarantee one-copy serializability
 Issue of loss conflict in transactions
Read quorum, Write quorum
• Version number attached to replicated object
• Highest version numbered object is the latest
object in read.
• Write operation advances version by 1
• Write quorum > half of all object copies
• Write quorum+read quorum > all object
copies
Gossip Update
 Applicable for frequent read, less update
situations
 Increased performance
 Typical read one, write gossip
 Use timestamp
Basic Gossip Update
 Used for overwrite
 Three operations, read, update, gossip arrive
 Read, if TSfsa<=TSrm, RM has recent data, return
it, otherwise wait for gossip, or try other RM
 Update, if Tsfsa>TSrm, update. Update TSrm send
gossip. Otherwise, process based on application,
perform update or reject
 Gossip arrive, update RM if gossip carries new
updates.
Causal Order Gossip Protocol[3]
 Used for read-modify
 In a fixed RM configuration
 Using vector timestamps
 Using buffer to keep the order
Windows Server 2008[4]
• Support DFS
• “State based, multi master” scheduled replication
• Use namespace for transparent file sharing
• Use Remote Differential Compression to propagate
change only to save bandwidth
FUTURE WORK IN FILE
REPLICATION
DISTRIBUTED FILE SYSTEMS: STATE OF THE ART
• GFS: Google File System
–Google
–C/C++
• HDFS: Hadoop Distributed File System
–Yahoo
–Java, Open Source
• Sector: Distributed Storage System
–University of Illinois at Chicago
–C++, Open Source
FILE SYSTEMS OVERVIEW
• System that permanently stores data
• Usually layered on top of a lower-level physical storage
medium
• Divided into logical units called “files”
–Addressable by a filename (“foo.txt”)
–Usually supports hierarchical nesting (directories)
• A file path joins file & directory names into a relative or
absolute address to identify a file (“/home/aaron/foo.txt”)
SHARED/PARALLEL/DISTRIBUTED FILE
SYSTEMS
• Support access to files on remote servers
• Must support concurrency
–Make varying guarantees about locking, who “wins” with
concurrent writes, etc...
–Must gracefully handle dropped connections
• Can offer support for replication and local caching
• Different implementations sit in different places on
complexity/feature scale
SECTOR AND SPHERE
 Sector: Distributed Storage System
 Sphere: Run-time middleware that supports
simplified distributed data processing.
 Open source software, GPL, written in C++.
 Started since 2006, current version 1.18
 http://sector.sf.net
Sector –Brief Definition
Sector is an open source data cloud model.
Its Assumptions: Presence of a high bandwidth
data link among the racks in a Data Center and
also among different Data Centers. Also, that
individual applications may have to process large
streams of data and produce equally large output
streams.
SECTOR: DISTRIBUTED STORAGE SYSTEMS
SECTOR: DISTRIBUTED STORAGE SYSTEMS
• Sector stores files on the native/local file system of each slave
node.
• Sector does not split files into blocks
–Pro: simple/robust, suitable for wide area
–Con: file size limit
• Sector uses replications for better reliability and availability
• The master node maintains the file system metadata. No
permanent metadata is needed.
• Topology aware
SECTOR:WRITE/READ
•Write is exclusive
•Replicas are updated in a chained manner: the client
updates one replica, and then this replica updates
another, and so on. All replicas are updated upon the
completion of a Write operation.
•Read: different replicas can serve different clients at the
same time. Nearest replica to the client is chosen
whenever possible.
SECTOR: TOOLS AND API
•Supported file system operation: ls, stat, mv, cp, mkdir,
rm, upload, download
–Wild card characters supported
•System monitoring: sysinfo.
•C++ API: list, stat, move, copy, mkdir, remove, open,
close, read, write, sysinfo.
Sector: Architecture Sector manages its data with the
help of the following components.
Security Server : Authenticates the clients and the slave
servers
Master Server : Contains File Meta Data, Schedules the
work among the slave servers.
Slave Servers: Contains datasets divided amongst them
in the form of Linux files.
Sphere:
A Brief Definition Sphere is a programming model built
over the Sector architecture of data cloud.
It falls under the Single instruction Multiple Data
Category from Flynn’s taxonomy.
Sphere :
Computing Model Sphere identifies the individual records in a file
with the help of index files.
Each record or a bunch of records are treated as independent data
entities that can be processed in Parallel by different slave nodes.
Similar to data parallelism.
These slave nodes are managed by Sphere
Engines(SPE) and the SPE are scheduled by Sphere.
Multiple stages of SPE can be coupled together.
Processing
MAP REDUCE
Similarity to Map Reduce Sphere model is quite similar
to Map Reduce because both of them deal with data
parallelism and allow coupling of at least two stages of
processing elements.
Map Reduce shuffles the data among the slave nodes in
the second stage whereas Sphere allows the output of the
first stage to be distributed among different processing
nodes of the second stage.
References
[1] Wikipedia; http://en.wikipedia.org/wiki/Atomicity
[2] M. T. Harandi;J. Hou (modified: I. Gupta);"Transactions with
Replication";http://www.crhc.uiuc.edu/~nhv/428/slides/repl-trans.ppt
[3] Randy Chow,Theodore Johnson, “Distributed Operating Systems &
Algorithms”, 1998
[4] "Overview of the Distributed File System Solution in Microsoft Windows
Server 2003
R2";http://technet2.microsoft.com/WindowsServer/en/library/d3afe6ee3083-4950-a093-8ab748651b761033.mspx?mfr=true
[5] "Distributed File System Replication: Frequently Asked
Questions";http://technet2.microsoft.com/WindowsServer/en/library/f9b9
8a0f-c1ae-4a9f-9724-80c679596e6b1033.mspx?mfr=true
[6] http://code.google.com/edu/parallel/dsd-tutorial.html
[7] http://code.google.com/edu/parallel/mapreducetutorial.html
[8]http://static.googleusercontent.com/external_content/untru
sted_dlcp/labs.google.com/en/us/papers/gfs-sosp2003.pdf
[9]http://arxiv.org/ftp/arxiv/papers/0809/0809.1181.pdf
Download