Distributed File Systems Presentation By: Group 4 Deepti Goel Topics To Be Covered Distributed File System [Brief Discussion]. NFS Basic Structure AFS Basic Organization Differences between NFS and AFS. Distributed Cache Systems. Oracle’s Distributed Database Systems. Operating System Level Support For Coherence In Distributed Systems. [PAPER] Distributed File system Is a distributed implementation of the classical time sharing model of a file system, where multiple users share files and storage resources. Performance Measurement of a DFS depends on : The amount of time needed to satisfy service requests. The multiplicity and dispersion of its servers and storage devices should be made invisible. Transparent DFS facilitates user mobility by bringing the user environment to wherever the user logs in. Naming and Transparency Naming is mapping between logical and physical objects. The textual name is mapped to a lower level numerical identifier that in turn is mapped to the disk blocks. Two notions regarding mapping in DFS are: Location Transparency : The name of the file does not reveal any hint of the file’s physical storage location. Location Independence: The name of the file does not need to be changed when the file’s physical storage location changes. Among the other transparency requirements of DFS are Access, Concurrency, Failure, Performance, Migration and Replication Transparency. Naming schemes in DFS Files are named by some combination of their host name and local name. Remote directories are attached to local directories thus giving the appearance of a coherence directory tree. This scheme is provided and popularized by Sun’s NFS . Protocols/FS using DFS NETWORK FILE SYSTEM The most common used versions of NFS are version 2 and 3 NFS version 3 contains several features to improve performance, reduce server load, and reduce network traffic. CODA FILE SYSTEM is an experimental file system, developed in the group of M. Satyanarayanan at Carnegie Mellon University since 1987. SERVER MESSAGE BLOCK(SMB) This protocol is sometimes also referred to as the Common Internet File System (CIFS), LAN Manager or NetBIOS protocol. IBM and Microsoft developed it. APPLETALK It is local area network communication protocol originally created for Apple computers. NETWARE Novell has redesigned (or at least re-featured) NetWare to work successfully as part of larger and heterogeneous networks, including the Internet. REMOTE FILE SHARING (RFS) RFS groups hosts into domains for facilitating mounting of file systems. It is similar to NFS in most respects. Sun’s Network File System (NFS) • Architecture: – NFS as collection of protocols the provide clients with a distributed file system. – Remote Access Model (as opposed to Upload/Download Model) – Every machine can be both a client and a server. – Servers export directories for access by remote clients (defined in the /etc/exports file). – Clients access exported directories by mounting them remotely. • Protocols: – mounting • Client sends a path name and server returns a file handle. • Static mounting (at boot-up) vs. automounting. • Hardmounting vs. soft mounting – file and directory access • Servers are stateless (no OPEN/CLOSE calls) A detailed information is given on the site of faculty.tamu.edu in pdf format. Basic architecture Figure Andrew File System (AFS) Andrew File System was a distributed computing environment designed and implemented at Carnegie Mellon University starting in 1983.It was subsequently chosen at a DFS for industry coalition. Features: Uniform name space Location independent file sharing Client side caching with cache consistency Secure authentication by Kerberos Scalability It includes server-side caching in the form of replicas with high availability through automatic switch over to a replica if the source server is unavailable Designing Issues: Client Mobility: Clients are able to access any file in the shared name space from any workstation but they may face some performance degradation when accessing files from other that their own workstations. Security: Authentication and secure transmission are based on RPC paradigm. Protection: AFS provides access lists for protecting directories and the regular UNIX bits for file protection. Heterogeneity: Defining a clear interface is a key to integrate the diverse workstation hardware and O.S. Basic architecture Figure Differences between AFS and NFS With NFS, different clients can mount the same file system in different places. While there is one AFS file system for the planet. Unlike NFS, which makes use of /etc/filesystems (on a client) to map between a local directory name and a remote filesystem, AFS does its mapping (filename to location) at the server. This has the tremendous advantage of making the served file space location independent Using NFS, you would have to change the /etc/filesystems file on 20 clients and take "/home" off-line while you moved it between servers.With AFS, you simply move the AFS volume(s) which constitute "/home" between the servers. You do this "online" while users are actively using files in "/home" with no disruption to their work. AFS is far more secure than NFS. It uses a special authentication system called the Kerberos Distributed Cache Systems The designers decided that providing an automatic coherence mechanism in the cache system was counter to their efficiency goals. Several distributed file systems that include some form of caching exist. Sun Microsystems’ Network Disk The client workstation contains software that simulates a locally attached disk by building and transmitting command packets to the disk server. The server responds by transferring complete disk blocks. Cedar file System (CFS) The Cedar experimental programming environment developed at the Xerox Palo Alto Research Center supports a distributed file system called CFS. Each of the Cedar workstations has a local disk, and this disk can be used for local private files or shared files copied from a remote file server. The ITC Distributed File System Vice, the shared component of the distributed system, implements a distributed file system that allows sharing of files. Each client workstation has a local disk, which is used for private files or shared files from a Vice file server. Sun Microsystems Network File System Basic Features: Full sharing of remote files . Each entry in the cache has an associated timeout . Coherence between client caches is achieved by ensuring that each client is coherent with the server’s cache. Apollo DOMAIN The Apollo DOMAIN operating system embodies a distributed file system that allows location transparent access of objects. The distributed file system does nothing to guarantee cache coherence between nodes. Client programs are required to use locking primitives provided by the operating system to maintain consistency of access. Distributed Database Concepts A distributed database is a set of databases stored on multiple computers that typically appears to applications as a single database. An application can simultaneously access and modify the data in several databases in a network. Transparency in a Distributed Database System The goal of transparency is to make a distributed database system appear as though it is a single Oracle database. The following sections explain more about transparency in a distributed database system. Location Transparency Location transparency exists when a user can universally refer to a database object such as a table. Location transparency has several benefits, including: Access to remote data is simple. Administrators can move database objects with no impact on end-users or existing database applications. Contd. Statement and Transaction Transparency Oracle's distributed database architecture also provides query, update, and transaction transparency. Replication Transparency Oracle also provide many features to transparently replicate data among the nodes of the system. Distributed Database Security Oracle supports all of the security features that are available with a non-distributed database environment for distributed database systems, including: password or external service authentication for users and roles login packet encryption for client-to-server and server-toserver connections Supporting User Accounts and Roles As we create the database links for the nodes in a distributed database system, determine what user accounts and roles each site needs to support server-to-server connections that use the links. Contd. Global Users and Roles The use of a global authentication service is a common technique for simplifying security management for distributed environments. In an Oracle client/server or distributed database environment, there are two options to support global authentication for users and roles: Oracle Security Server . When global database user and role authentication must work within the framework of a non-Oracle authentication service Data Encryption It protects data from unauthorized viewing by using Data Encryption Standard (DES) encryption algorithm. Tools for Administering Oracle Distributed Databases Enterprise Manager The graphical component of Enterprise Manager (Enterprise Manager/GUI) allows you to perform database administration tasks with the convenience of a graphical user interface (GUI). Third-Party Administration Tools Currently more than 60 companies produce more than 150 products that help manage Oracle databases and networks, providing a truly open environment. SNMP Support Oracle Simple Network Management Protocol (SNMP) support allows an Oracle server to be located and queried by any SNMP-based network management system. Operating System level support for coherence in Distributed Systems Two key issues : Problem of disseminating rollback Computational progress does not occur monotonically Coherence The ideal coherence mechanism would have certain properties. local – never unnecessarily demanding a global view of the system adaptive -the coherence premium would depend only on the actual incoherence potential of a given computation homogeneous - not requiring or expecting any particular topology, either physical or logical live - avoiding deadlock. The best way to achieve these properties is to exercise the coherence control by Optimistic mechanism. It is the ability to perform rollback, in order to extricate the system from incoherent states. Contd. The Architecture The principal service offered by the architecture is support for coherence via identifiable units of computation which we call transactions . The transaction service relies on a rollback service, which in turn relies on communications and stable storage services, as shown in Figure. The transaction module gives access to the rollback service in order to support coherence control, and is therefore the point from which rollback is initiated~ The diffusion of transactions makes it important not to roll a transaction back more than :is necessary to resolve the immediate conflict. Modular Architecture: Transaction Management Rollback checkpointing Communications Stable Storage Contd. Communications Distributed Systems enamel communication by having a reliable monotonic message delivery service, asynchronous send and receive semantics, and an unconstrained system address space. Stable Storage This module provides the stable virtual memory that supports rollback. The Rollback Engine The second level module is the rollback engine. Rollback is performed using the Time Warp scheme of "unsending" messages. Each message has a corresponding antimessage which, when sent to the same target process, serves to cancel the original positive message. Coda-Version of AFS What is Coda? Coda is a distributed filesystem with its origin in AFS2. It has many features that are very desirable for network filesystems. Features: disconnected operation for mobile computing is freely available under a liberal license high performance through client side persistent caching server replication security model for authentication, encryption and access control continued operation during partial network failures in server network network bandwidth adaptation good scalability well defined semantics of sharing, even in the presence of network failures Current activities on Coda To further develop and adapt the system for wider use the future researches will emphasize: reliability and performance ports to important platforms documentation, mailing groups extensions in functionality End