Distributed File Systems - Lyle School of Engineering

advertisement
Distributed File Systems
Presentation By:
Group 4
Deepti Goel
Topics To Be Covered
 Distributed File System [Brief Discussion].
 NFS Basic Structure
 AFS Basic Organization
 Differences between NFS and AFS.
 Distributed Cache Systems.
 Oracle’s Distributed Database Systems.
 Operating System Level Support For
Coherence In Distributed Systems. [PAPER]
Distributed File system
Is a distributed implementation of the classical time
sharing model of a file system, where multiple users
share files and storage resources.
Performance Measurement of a DFS depends on :
 The amount of time needed to satisfy service
requests.
 The multiplicity and dispersion of its servers and
storage devices should be made invisible.
Transparent DFS facilitates user mobility by bringing the
user environment to wherever the user logs in.
Naming and Transparency
Naming is mapping between logical and physical objects. The textual
name is mapped to a lower level numerical identifier that in turn is
mapped to the disk blocks. Two notions regarding mapping in DFS are:
 Location Transparency : The name of the file does not reveal any hint
of the file’s physical storage location.
 Location Independence: The name of the file does not need to be
changed when the file’s physical storage location changes.
Among the other transparency requirements of DFS are Access,
Concurrency, Failure, Performance, Migration and Replication
Transparency.
Naming schemes in DFS
 Files are named by some combination of their host name and local
name.
 Remote directories are attached to local directories thus giving the
appearance of a coherence directory tree. This scheme is provided and
popularized by Sun’s NFS .
Protocols/FS using DFS
NETWORK FILE SYSTEM
The most common used versions of NFS are version 2 and 3 NFS
version 3 contains several features to improve performance, reduce
server load, and reduce network traffic.
CODA FILE SYSTEM
is an experimental file system, developed in the group of M.
Satyanarayanan at Carnegie Mellon University since 1987.
SERVER MESSAGE BLOCK(SMB)
This protocol is sometimes also referred to as the Common Internet
File System (CIFS), LAN Manager or NetBIOS protocol. IBM and
Microsoft developed it.
APPLETALK
It is local area network communication protocol originally created for
Apple computers.
NETWARE
Novell has redesigned (or at least re-featured) NetWare to work
successfully as part of larger and heterogeneous networks, including
the Internet.
REMOTE FILE SHARING (RFS)
RFS groups hosts into domains for facilitating mounting of file
systems. It is similar to NFS in most respects.
Sun’s Network File System (NFS)
• Architecture:
– NFS as collection of protocols the provide clients with a distributed
file system.
– Remote Access Model (as opposed to Upload/Download Model)
– Every machine can be both a client and a server.
– Servers export directories for access by remote clients (defined in the
/etc/exports file).
– Clients access exported directories by mounting them remotely.
• Protocols:
– mounting
• Client sends a path name and server returns a file handle.
• Static mounting (at boot-up) vs. automounting.
• Hardmounting vs. soft mounting
– file and directory access
• Servers are stateless (no OPEN/CLOSE calls)
A detailed information is given on the site of faculty.tamu.edu in pdf format.
Basic architecture
Figure
Andrew File System (AFS)
Andrew File System was a distributed computing environment designed and
implemented at Carnegie Mellon University starting in 1983.It was subsequently
chosen at a DFS for industry coalition.
Features:
 Uniform name space
 Location independent file sharing
 Client side caching with cache consistency
 Secure authentication by Kerberos
 Scalability
It includes server-side caching in the form of replicas with high availability through
automatic switch over to a replica if the source server is unavailable
Designing Issues:
 Client Mobility: Clients are able to access any file in the shared name space from
any workstation but they may face some performance degradation when
accessing files from other that their own workstations.
 Security: Authentication and secure transmission are based on RPC paradigm.
 Protection: AFS provides access lists for protecting directories and the regular
UNIX bits for file protection.
 Heterogeneity: Defining a clear interface is a key to integrate the diverse
workstation hardware and O.S.
Basic architecture
Figure
Differences between AFS and NFS
 With NFS, different clients can mount the same file system in
different places. While there is one AFS file system for the
planet.
 Unlike NFS, which makes use of /etc/filesystems (on a client) to
map between a local directory name and a remote filesystem,
AFS does its mapping (filename to location) at the server. This
has the tremendous advantage of making the served file space
location independent
 Using NFS, you would have to change the /etc/filesystems file
on 20 clients and take "/home" off-line while you moved it
between servers.With AFS, you simply move the AFS volume(s)
which constitute "/home" between the servers. You do this "online" while users are actively using files in "/home" with no
disruption to their work.
 AFS is far more secure than NFS. It uses a special
authentication system called the Kerberos
Distributed Cache Systems
The designers decided that providing an automatic
coherence mechanism in the cache system was
counter to their efficiency goals. Several distributed
file systems that include some form of caching exist.
Sun Microsystems’ Network Disk
The client workstation contains software that simulates
a locally attached disk by building and transmitting
command packets to the disk server. The server
responds by transferring complete disk blocks.
Cedar file System (CFS)
The Cedar experimental programming environment
developed at the Xerox Palo Alto
Research Center supports a distributed file system
called CFS. Each of the Cedar workstations has a
local disk, and this disk can be used for local private
files or shared files copied from a remote file server.
The ITC Distributed File System
Vice, the shared component of the distributed system,
implements a distributed file system that allows
sharing of files. Each client workstation has a local
disk, which is used for private files or shared files
from a Vice file server.
Sun Microsystems Network File
System
Basic Features:
 Full sharing of remote files .
 Each entry in the cache has an associated timeout .
 Coherence between client caches is achieved by
ensuring that each client is coherent with the server’s
cache.
Apollo DOMAIN
 The Apollo DOMAIN operating system embodies a
distributed file system that allows location transparent
access of objects.
The distributed file system does nothing to guarantee
cache coherence between nodes. Client programs
are required to use locking primitives provided by the
operating system to maintain consistency of access.
Distributed Database Concepts
A distributed database is a set of databases stored on
multiple computers that typically appears to
applications as a single database. An application can
simultaneously access and modify the data in several
databases in a network.
Transparency in a Distributed
Database System
The goal of transparency is to make a distributed
database system appear as though it is a single
Oracle database. The following sections explain
more about transparency in a distributed database
system.
 Location Transparency
Location transparency exists when a user can
universally refer to a database object such as a table.
Location transparency has several benefits,
including:


Access to remote data is simple.
Administrators can move database objects with no
impact on end-users or existing database applications.
Contd.
 Statement and Transaction Transparency
Oracle's distributed database architecture also provides
query, update, and transaction transparency.
 Replication Transparency
Oracle also provide many features to transparently
replicate data among the nodes of the system.
Distributed Database Security
Oracle supports all of the security features that are available with a
non-distributed database environment for distributed database
systems, including:
 password or external service authentication for users and
roles
 login packet encryption for client-to-server and server-toserver connections
 Supporting User Accounts and Roles
As we create the database links for the nodes in a distributed
database system, determine what user accounts and roles each
site needs to support server-to-server connections that use the
links.
Contd.
 Global Users and Roles
The use of a global authentication service is a common technique
for simplifying security management for distributed
environments. In an Oracle client/server or distributed database
environment, there are two options to support global
authentication for users and roles:
 Oracle Security Server .
 When global database user and role authentication must
work within the framework of a non-Oracle authentication
service
 Data Encryption
It protects data from unauthorized viewing by using Data
Encryption Standard (DES) encryption algorithm.
Tools for Administering Oracle
Distributed Databases
 Enterprise Manager
The graphical component of Enterprise Manager (Enterprise
Manager/GUI) allows you to perform database administration
tasks with the convenience of a graphical user interface (GUI).
 Third-Party Administration Tools
Currently more than 60 companies produce more than 150
products that help manage Oracle databases and networks,
providing a truly open environment.
 SNMP Support
Oracle Simple Network Management Protocol (SNMP) support
allows an Oracle server to be located and queried by any
SNMP-based network management system.
Operating System level support for
coherence in Distributed Systems
Two key issues :
 Problem of disseminating rollback
 Computational progress does not occur monotonically
Coherence
The ideal coherence mechanism would have certain properties.
 local – never unnecessarily demanding a global view of the
system
 adaptive -the coherence premium would depend only on the
actual incoherence potential of a given computation
 homogeneous - not requiring or expecting any particular
topology, either physical or logical
 live - avoiding deadlock.
The best way to achieve these properties is to exercise the
coherence control by Optimistic mechanism. It is the ability to
perform rollback, in order to extricate the system from
incoherent states.
Contd.
The Architecture
The principal service offered by the architecture is
support for coherence via identifiable units of
computation which we call transactions . The
transaction service relies on a rollback service, which
in turn relies on communications and stable storage
services, as shown in Figure.
The transaction module gives access to the rollback
service in order to support coherence control, and is
therefore the point from which rollback is initiated~
The diffusion of transactions makes it important not to
roll a transaction back more than :is necessary to
resolve the immediate conflict.
Modular Architecture:
Transaction
Management
Rollback
checkpointing
Communications
Stable
Storage
Contd.
Communications
Distributed Systems enamel communication by having a reliable
monotonic message delivery service, asynchronous
send and receive semantics, and an unconstrained system
address space.
Stable Storage
This module provides the stable virtual memory that supports
rollback.
The Rollback Engine
The second level module is the rollback engine. Rollback is
performed using the Time Warp scheme of "unsending"
messages. Each message has a corresponding antimessage
which, when sent to the same target process, serves to cancel
the original positive message.
Coda-Version of AFS
What is Coda?
Coda is a distributed filesystem with its origin in AFS2. It has many
features that are very desirable for network filesystems.
Features:









disconnected operation for mobile computing
is freely available under a liberal license
high performance through client side persistent caching
server replication
security model for authentication, encryption and access
control
continued operation during partial network failures in
server network
network bandwidth adaptation
good scalability
well defined semantics of sharing, even in the presence of
network failures
Current activities on Coda
To further develop and adapt the system for wider use
the future researches will emphasize:
 reliability and performance
 ports to important platforms
 documentation, mailing groups
 extensions in functionality
End
Download