DISTRIBUTED FILE SYSTEM SUMMARY

advertisement
DISTRIBUTED FILE SYSTEM
SUMMARY
RANJANI SANKARAN
Outline
•
•
•
•
•
•
Characteristics of DFS
DFS Design and Implementation
Transaction and Concurrency Control
Data and File Replication
Current Work
Future Work
DFS Characteristics
Dispersion
• Dispersed Files
Location Transparent
Location Independent
• Dispersed Clients
login transparency
access transparency
Multiplicity
• Multiple Files
Replication Transparency
• Multiple Clients
Concurrency Transparency
Others (general)
• Fault Tolerance – crash of server or client, loss of message
• Scalability – Incremental file system growth
DFS STRUCTURE[3]
DFS Root-Top level ; Holds links to shared folders in a Domain
DFS Link- Share under the root; Link redirects to shared folder
DFS Replicas or Targets- identical shares on 2 servers can be
grouped together as Targets under one link.
MAPPING OF LOGICAL AND PHYSICAL
FOLDERS[2]
DFS Design and Implementation
• Problems –File Sharing and File Replication
File and File Systems
File name -Mapping symbolic name to a
unique file id (ufid or file handle) which is the
function of directory service.
File attributes -ownership, type, size,
timestamp, access authorization information.
Data Units - Flat / Hierarchical Structure
File Access - sequential, direct, indexedsequential
COMPONENTS IN A FILE SYSTEM
directory service
name resolution, add and
deletion of files
authorization service
capability and / or access
control list
file
transaction
concurrency and replication
management
basic
read / write files and get / set
attributes
service
system service
device, cache, and block
management
Overview of FS Services
DIRECTORY SERVICE – Search , Create, Delete, Rename files, mapping
and locating, list a directory , traverse the file system.
AUTHORIZATION SERVICE – Authorized access for security ; Read,
Write , Append , Execute, Delete , List operations
FILE SERVICE – Transaction Service :
Basic Service : Read , Write , Open , Close ,
Delete ,Truncate , Seek
SYSTEM SERVICE – Replication , Caching ,Mapping of addresses etc.
SERVICES and SERVERS
Servers/Multiple Servers implement Services;
Client <->Server relation ship is relative;
File Mounting
• Attach a remote named file system to the client’s file system
hierarchy at the position pointed to by a path name
• Once files are mounted, they are accessed by using the
concatenated logical path names without referencing either
the remote hosts or local devices.
• Location Transparent
• Linked information (mount table) is kept till they are
unmounted.
• Different clients may perceive a different FS view
– To achieve a global FS view – SA enforces mounting rules
– Restrict/Allow mounting –Server’s export file.
Types of Mounting
– Explicit mounting: clients make explicit mounting
system calls whenever one is desired
– Boot mounting: a set of file servers is prescribed
and all mountings are performed the client’s boot
time
– Auto-mounting: mounting of the servers is
implicitly done on demand when a file is first
opened by a client
Server Registration
• The mounting protocol is not transparent – the initial
mounting requires knowledge of the location of file
servers
• Server registration
– File servers register their services, and clients
consult with the registration server before
mounting
– Clients broadcast mounting requests, and file
servers respond to client’s requests
Stateful and Stateless File Servers
• Stateful file Server : file servers maintain state information about
clients between requests
• Stateless file Server : when a client sends a request to a server, the
server carries out the request, sends the reply, and then remove
from its internal tables all information about the request
– Between requests, no client-specific information is kept on the
server
– Each request must be self-contained: full file name and offset…
• State information could be:
• Opened files and their clients
• File descriptors and file handles
• Current file position pointers, mounting information
• Cache or buffer
File Access and Semantics of Sharing
• File Sharing
– Overlapping access :Multiple copies of same file
• Cache or replication, Space Multiplexing
• Coherency Control: coherent view of shared files,
managing access to replicas, atomic updates.
– Interleaving access: Multiple granularities of data access
operations
• Time Multiplexing
• Simple Read Write, Transaction, Session
• Concurrency Control: Prevent erroneous /inconsistent
results during concurrent access
Semantics of Sharing/Replication
• Unix Semantics : Currentness : Writes propagated
immediately so that reads will return latest value.
• Transaction Semantics: Consistency: Writes are stored and
propagated when consistency constraints are met.
• Session Semantics:Efficiency:Writes done on a working copy;
results made permanent during session close.
REPLICATION
• Write Policies
• Cache Coherence Control
• Version Control
Transaction and Concurrency Control
• Concurrency Control Protocol required to
maintain ACID Semantics for Concurrent
transactions.
• Distributed Transaction Processing System:
– Transaction Manager: correct execution of local
and remote transactions.
– Scheduler: schedules operations to avoid conflicts
using locks, timestamps and validation managers.
– Object Manager: coherency of replicas/caches;
interface to the file system.
Transaction and Concurrency Control
Serializability
• A schedule is Serializable if the result of
execution is equivalent to that of a serial
schedule. (without cyclic hold-wait deadlock
situations, holding conflicting locks etc.).
• In Transactions, the transaction states must be
consistent.
• Conflicts – write-write: write-read: read-write
on a shared object
Interleaving Schedules
• Sched (1 ,3) and (2,4) are trying to perform
similar operations on data objects C and D.
• (1,2) and (3,4) order is only valid.
Concurrency Control Protocols
• Two Phase Locking:
– Growing Phase, Shrinking Phase
– Sacrifices concurrency and sharing for Serializability
– Circular wait(deadlock)
to : Write A=100 ; Write B =20
t1 : Read A ,Read B 1. Write Sum in C;2.Write diff in D
t2 : Read A, Read B 3. Write sum in D;3.Write diff in C
Solution : Release locks as soon as possible
Problem : Rolling aborts , Commit dependence
Solution : Strict 2 Phase Locking Systems
Time Stamp Ordering
• Time Stamp Ordering
– Logical timestamps or counters ,unique timestamps for Txs.
– Larger TS Txs wait for smaller TS Txs;Small TS Txs die and restart when
confronting larger TS Txs.
– t0 ( 50 ms) < t1 (100 ms)< t2 (200 ms);
t0 : write A=100 ; Write B = 20 ; ->Completed
t1 : Read A ,Read B 1. Write Sum in C;2.Write diff in D
t2 : Read A, Read B 3. Read Sum in C;3.Write diff in C
Time Stamp Ordering Concurrency
Control
• RD and WR –Logical TS for last read/write
• Tmin is the minimum tentative time for pending write.
Optimistic Concurrency Control
• Allows entire transaction to complete and then
validate the transaction before making its
effect permanent
• Execution Phase ,Validation Phase , Update
Phase
• Optimistic Concurrency Control mechanism
• Validation : 2 Phase Commit Protocol by
sending validation request to all TMs.
• Validated updates are committed in Update
Phase.
Data and File Replication
• For Concurrent access and availability.
• GOAL
• One-copy Serializability:
– The execution of transaction on replicated objects is
equivalent to the execution of the same transactions on
non-replicated objects
– Read Operations : Read-one-primary, Read-one ,Readquorum
– Write Operations:Write-one-primary,Write-all,Write-allavailable,Write-quorum,Write-gossip
• Quorum Voting:
• Gossip Update Propagation
• Casual Order Gossip Protocol
ARCHITECTURE
• Client chooses one or more FSA to access data object.
• FSA acts as front end to replica managers RMs to provide
replication transparency.
• FSA contacts one or more RMs for actual updating and
reading of data objects.
Quorum Voting/Gossip Update
Propagation
• Quorum Voting : Uses Read Quorum, Write Quorum
– Write-write conflict: 2 * Write quorum > all object copies
– Read-write conflict: Write quorum + read quorum > all
object copies.
• Gossip Update Propagation:
– Read: if TSfsa<=TSrm, RM has recent data, return it, otherwise wait for
gossip, or try other RM
– Update :if Tsfsa>TSrm, update. Update TSrm send gossip. Otherwise,
process based on application, perform update or reject
– Gossip : update RM if gossip carries new updates.
Gossip Updation Protocol
• Used in Fixed RM Configuration
• Uses Vector Timestamps, Uses buffer to keep
order
Current Work
Here are some links to current distributed-file system and related
projects:
Ceph: http://ceph.newdream.net/
(Peta Byte Scale DFS which is Posix Compatible and fault tolerant)
GlusterFS: http://www.gluster.org/
HDFS: http://hadoop.apache.org/hdfs/
HekaFS: http://www.hekafs.org/
OrangeFS: http://www.orangefs.org/ and http://www.pvfs.org/
KosmosFS: http://code.google.com/p/kosmosfs/
MogileFS: http://danga.com/mogilefs/
Swift (OpenStack Storage):
http://www.openstack.org/projects/storage/
FAST'11 proceedings: http://www.usenix.org/events/fast11/tech/
Future Work
• usability/scalability issues relate to the costs of traversal in
Distributed File Systems as traditional model of file traversal
might not be suitable for searching /indexing [3]
• File Systems adding support for their own indexing
(Continuous/incremental updates of indexes)
• NFS family might become increasingly irrelevant for more
geographically distributed enterprises.
• Innovations in the area of multi tenancy and security for
Distributed/Cloud Computing
References
1.
2.
3.
4.
5.
6.
7.
R. Chow and T. Johnson, Distributed Operating Systems & Algorithms,
1997
http://www.windowsnetworking.com/articles_tutorials/ImplementingDFS-Namespaces.html -DFS Namespaces reference
http://www.quora.com/Distributed-Systems/What-is-the-future-of-filesystems -Future of File Systems
http://www.cs.iit.edu/~iraicu/research/publications/2011_LSAP2011_ex
ascale-storage.pdf -Issues with DFS at Exascale
http://www.usenix.org/publications/login/201008/openpdfs/maltzahn.pdf - Ceph as a scalable alternative to Hadoop.
http://wwwsop.inria.fr/members/Patrick.Valduriez/pmwiki/Patrick/uploads//Confer
ences/dexa2011.pdf - Distributed Data Management in 2020?
http://www.greenplum.com/media-center/big-data-use-cases/agileanalytics -Hadoop might become the future solution
THANKS YOU
Download