Network File Systems

advertisement

Network File Systems

NFS

AFS

SMB

NFS General

http://en.wikipedia.org/wiki/Network_file_system

Network File System

(NFS)

Any computer file system that supports sharing of files and other resources as persistent storage over a computer network

First file servers were developed in the 1970s,

In 1985 Sun Microsystems created the Network File

System (NFS)

Became the first widely used distributed file system

Other notable distributed file systems

Andrew File System (AFS)

Server Message Block SMB

AKA CIFS

Distribution

Distribution

Distributed File System (DFS)

Network file system with:

Clients, servers, and storage devices

Spread throughout the machines of a distributed system or intranet

Service activity occurs across the network

System has multiple and independent storage devices

 Some DFS servers run on dedicated machines

 Others a machine can be both a server and a client

A DFS can be implemented:

As part of a distributed operating system

By a software layer

 Managing the communication between conventional operating systems and file systems

Distribution

Distinctive feature of a DFS:

System with many and autonomous clients and servers

Transparency

Transparency

A DFS should appear to its users to be a conventional, centralized file system

Multiplicity and dispersion of its servers and storage devices should be invisible

Client interface used by programs should not distinguish between local and remote files

The DFS locates the files and arranges the transport of data

Performance

Performance

Most important performance measurement of a

DFS:

Amount of time needed to satisfy service requests

Typical performance constraints:

Conventional systems:

Disk-access time

Small amount of CPU-processing time

DFS:

Additional overhead due to the distributed structure

Includes:

 Time to deliver the request to a server

 Time to deliver the response to the client

 For each direction, a CPU overhead of running the communication protocol software

Performance

The performance of a DFS can be viewed as one dimension of its transparency

Ideally, it would be comparable to that of a conventional file system

Concurrent file updates

Concurrent file updates

A DFS should allow multiple client processes on multiple machines to access and update the same files

Updates to the file from one client should not interfere with access and updates from other clients

Concurrency control or locking may be:

Built into the file system

Provided by an add-on protocol

Network File Systems

Sample Network File Systems

Client-server file systems

9P

Amazon S3

Coda

Secure File System (SFS)

Self-certifying File System (SFS)

Sample Network File Systems

Distributed file systems

Ceph

Andrew File System (AFS), OpenAFS

DCE Distributed File System

Remote file access protocol used with the Distributed Computing Environment

Google File System

File system that is optimized for Google's core storage needs

Hadoop

Java software framework that supports distributed applications running on large clusters of commodity computers

Kosmos FS

High performance distributed file system. It is also integrated with hadoop.

Lustre

Distributed file system for large scale cluster computing

Microsoft Distributed File System

Set of client and server services for organizing many distributed SMB file shares into a distributed file system

GlusterFS

Platform for developing clustering applications

Parallel Virtual File System

Parallel file system that focuses on high performance access to large data sets

NFS Protocol

http://en.wikipedia.org/wiki/Network_File_System_%28protocol%29

Network File System Protocol

Network file system protocol:

Allows a user on a client computer to access files over a network

As if the network devices were attached to its local disks

Originally developed by Sun Microsystems in 1984

NFS builds on the Open Network Computing

Remote Procedure Call (ONC RPC) system

The Network File System protocol is specified in

RFC 1094, RFC 1813, and RFC 3530 (which obsoletes

RFC 3010)

Versions and variations

Versions and variations

Version 1

Sun used only for in-house experimental purposes

Did not release it to the public

Versions and variations

Version 2 of the protocol

Originally operated entirely over UDP

Defined in RFC 1094

March 1989

Designers meant to keep the protocol stateless

Locking (for example) implemented outside of the core protocol

Versions and variations

Version 3 (RFC 1813, June 1995) added:

Support for 64-bit file sizes and offsets

Handle files larger than 4 gigabytes (GB)

Support for asynchronous writes on the server

Improve write performance

Additional file attributes in many replies

Avoid the need to re-fetch them

READDIRPLUS operation

Get file handles and attributes along with file names when scanning a directory

Assorted other improvements

Versions and variations

At the introduction of Version 3

Vendor support for TCP as a transport-layer protocol began increasing

Several vendors had already added support for

NFS Version 2 with TCP as a transport

Sun Microsystems added support for TCP as a transport for NFS at the same time it added support for Version 3

Using TCP as a transport made using NFS over a WAN more feasible

Versions and variations

Version 4 became the first version developed with the IETF

After Sun Microsystems handed over the development of the NFS protocols

Versions and variations

Version 4 includes:

Performance improvements

Mandates strong security

Introduces a stateful protocol

Defined by

RFC 3010, December 2000

Revised in RFC 3530, April 2003

Versions and variations

Various side-band protocols have become associated with NFS, including:

The byte-range advisory Network Lock

Manager (NLM) protocol

Added to support UNIX System V file-locking APIs

The remote quota reporting (RQUOTAD) protocol

Allow NFS-users to view their data-storage quotas on

NFS servers

Versions and variations

WebNFS

Allows NFS to

Integrate more easily into Web-browsers

Enable operation through firewalls

An extension to Version 2 and Version 3

Sun Microsystems open sourced their

WebNFS implementation

 via https://yanfs.dev.java.net/

Platforms

Platforms

NFS most commonly used with UNIX systems

Other software platforms such can also use the protocol

Mac OS, Microsoft Windows, Novell NetWare, and IBM

AS/400

Platforms

Alternative remote file access protocols include

Server Message Block (SMB, also known as CIFS) protocol

Apple Filing Protocol (AFP)

NetWare Core Protocol (NCP)

OS/400 File Server file system (QFileSvr.400)

SMB and NetWare Core Protocol (NCP) are most common on Microsoft Windows

AFP is most common on Macintosh systems

QFileSvr.400 is most common on AS/400 systems

Typical implementation

Typical implementation

Assume a Unix-style scenario: one machine (client) requires access to data stored on another machine (NFS server)

Server implements NFS daemon processes (running by default as nfsd) in order to make its data generically available to clients

Server administrator determines what to make available

Exporting the names and parameters of directories

Typically using the /etc/exports configuration file and the exportfs command

Server security-administration recognizes and approves validated clients

Server network configuration lets appropriate clients can negotiate with it through any firewall system

Client machine requests access to exported data

Issue a mount command

Users on the client machine then view and interact with mounted filesystems on the server within the parameters permitted

Note: automating the NFS mounting process is typical

E.g. using /etc/fstab and/or automounting facilities

Protocol development versus

Competing protocols

Protocol development versus competing protocols

1980s

 other.

During the development of the ONC protocol (called SunRPC at the time), only

 whenever two peers shared a common endianness in their machineformed network-computing environments.

Later, Sun and AT&T announced that the two firms would jointly develop AT&T's

System (RFS), and the quick adoption of NFS over RFS by Digital Equipment, HP,

NFS.

OSF solicited the proposals for various technologies, including the remote

Computing Environment (DCE), and the Distributed File System (DFS) won over including NCS and Kerberos . DFS used DCE as the RPC and derived from AFS.

Protocol development versus competing protocols

1990s

Sun Microsystems and the Internet Society (ISOC) reached an agreement to cede "change control" of ONC RPC so that ISOC's engineering-standards body, the Internet Engineering Task Force (IETF), could publish standards documents

(RFCs) documenting the ONC RPC protocols and could extend ONC RPC. OSF attempted to make DCE RPC an IETF standard, but ultimately proved unwilling to give up change-control. Later, the IETF chose to extend ONC RPC by adding a new authentication flavor, RPCSEC GSS , in order to meet IETF's requirements that protocol standards have adequate security.

Later, Sun and ISOC reached a similar agreement to give ISOC change control over NFS, although writing the contract carefully to exclude NFS version 2 and version 3. Instead, ISOC gained the right to add new versions to the NFS protocol, which resulted in IETF specifying NFS version 4 in 2003.

2000s

By the 21st century, neither DFS nor AFS had achieved any major commercial success as compared to CIFS or NFS. IBM, which had previously acquired the primary commercial vendor of DFS and AFS, Transarc , donated most of the AFS source code to the free software community in 2000. The OpenAFS project lives on. In early 2005, IBM announced end of sales for AFS and DFS.

Present

NFSv4.1 presents new opportunities for the NAS community

Primary addition is Parallel NFS (pNFS)

A method of introducing data access parallelism

Protocol defines a method of separating the meta-data (names and attributes) of a filesystem from the location of the file data

Goes beyond simple name/data separation

Defines method of striping the data amongst a set of data servers

Very different from the traditional NFS server

Holds the names of files and their data under the single umbrella of the server

Products exist that are multi-node NFS servers

Limited participation by the client in separation of meta-data and data

NFSv4.1 client can be enabled to be a direct participant in the exact location of file data

Avoid sole interaction with the single NFS server when moving data

Present

NFSv4.1 pNFS server:

A collection or community of server resources or components

Community members are controlled by the metadata server

pNFS client still accesses a single metadata server for traversal or interaction with the namespace

Client moves data to and from the server

Directly interact with the set of data servers belonging to the pNFS server community

Present

NFSv4.1 provides:

Sessions

Directory Delegation and Notifications

Multi-server Namespace

ACL/SACL/DACL

Retention Attributions

SECINFO_NO_NAME

AFS

http://en.wikipedia.org/wiki/Andrew_File

_System

Andrew File System

(AFS)

Distributed networked file system

Developed by Carnegie Mellon University

Part of the Andrew Project

Named for

Andrew Carnegie

Andrew Mellon

Primary used in distributed computing

Features

Features

AFS has several benefits over traditional networked file systems

Particularly in the areas of security and scalability

Not uncommon for enterprise AFS cells to exceed fifty thousand clients

Uses Kerberos for authentication

Implements access control lists on directories for users and groups

Each client caches files on the local filesystem for increased speed on subsequent requests for the same file

Allows limited filesystem access in the event of a server crash or a network outage

Features

Read and write operations on an open file are directed only to the locally cached copy

When a modified file is closed

Only the changed portions copied back to the file server

Cache consistency maintained by callback

When a file is cached

Server makes a note of this

Will inform the client if the file is updated by someone else

After any client, server, or network failure (including a time-out)

Callbacks are discarded and must be re-established

Re-establishing a callback involves:

Status check

Does not require re-reading the file itself

Features

Consequence of the whole file locking strategy

AFS does not support large shared databases or record updating within files shared between client systems

Deliberate design decision

Based on the perceived needs of the university computing environment

It leads to the use of a single file per message in the original email system for the Andrew Project, the

Andrew Message System, rather than the more customary single file per mailbox

Features

A significant feature of AFS is the volume

Tree of files

Sub-directories

AFS mountpoints

Links to other AFS volumes

Volumes are created by administrators

Linked at a specific named path in an AFS cell

Once created, users of the filesystem may create directories and files as usual without concern for the physical location of the volume

A volume may have a quota assigned to it in order to limit the amount of space consumed.

As needed, AFS administrators can move that volume to another server and disk location without the need to notify users

Moving can occur while files in that volume are even being used

Features

AFS volumes can be replicated to read-only cloned copies

When accessing files in a read-only volume, a client system will retrieve data from a particular read-only copy

If a copy becomes unavailable, clients will look for any remaining copies

Users of that data are unaware of the location of the readonly copy

Administrators can create and relocate copies as needed

AFS command suite guarantees that all read-only volumes contain exact copies of the original read-write volume at the time the read-only copy was created

Features

The file name space on an Andrew workstation is partitioned into a shared and local name space

Shared name space identical on all workstations

Usually mounted as /afs on the Unix filesystem

Local name space unique to each workstation

Contains only temporary files needed for workstation initialization and symbolic links to files in the shared name space

Features

Andrew File System

Heavily influenced Version 4 of Sun

Microsystems' NFS

Variant of AFS adopted by the Open Software

Foundation

Distributed File System (DFS)

Part of their Distributed computing environment

Implementations

Implementations

There are three major implementations

Transarc (IBM)

OpenAFS

Arla

Transarc software is losing support and is deprecated

AFS (version two) is also the predecessor of the

Coda file system.

A fourth implementation exists in the Linux kernel source code since at least version 2.6.10

Committed by Red Hat

A fairly simple implementation

Still in its early stages of development

Incomplete

Available permissions

Available permissions

The following Access Control List permissions can be granted:

Lookup (l)

 allows a user to list the contents of the AFS directory, examine the ACL associated with the directory and access subdirectories.

Insert (i)

 allows a user to add new files or subdirectories to the directory.

Delete (d)

 allows a user to remove files and subdirectories from the directory.

Administer (a)

 allows a user to change the ACL for the directory. Users always have this right on their home directory, even if they accidentally remove themselves from the ACL.

Available permissions

Permissions that affect files and subdirectories include:

Read (r)

 allows a user to look at the contents of files in a directory and list files in subdirectories. Files that are to be granted read access to any user, including the owner, need to have the standard UNIX "owner read" permission set.

Write (w)

 allows a user to modify files in a directory. Files that are to be granted write access to any user, including the owner, need to have the standard UNIX "owner write" permission set.

Lock (k)

 allows the processor to run programs that need to "flock" files in the directory. See the UNIX man page for "flock" for more details.

Available permissions

AFS includes Application ACLs (A)-(H) which have no effect on access to files.

SMB

http://en.wikipedia.org/wiki/Server_Message_Block

Server Message Block

(SMB)

Operates as an application-level network protocol

Applied to shared access to

Files, printers, serial ports, and miscellaneous communications

Between nodes on a network

Provides an authenticated Inter-process communication mechanism

Most usage of SMB involves computers running

Microsoft Windows:

Microsoft environments users often know it simply as

"Microsoft Windows Network"

Server Message Block

(SMB)

When discussing SMB, one should distinguish:

SMB the protocol

SMB services that run on the protocol

NetBIOS

DCE/RPC services that use SMB as an authenticated Inter-process communication channel (over named pipes)

"Network Neighborhood" protocols which primarily (but not exclusively) run as datagram services directly on the NetBIOS transport

History

History

Barry Feigenbaum originally invented SMB at IBM

Turn DOS "Interrupt 33" (21h) local file-access into a networked file-system

Microsoft made considerable modifications to the version used most commonly

Merged the SMB protocol with the LAN Manager

Developing with 3Com (circa 1990)

Continued to add features to the protocol

Windows for Workgroups (circa 1992)

Later versions of Windows.

History

Original design of SMB envisaged it running on top of the NetBIOS and

NetBEUI APIs

Typically implemented with

NBF

NetBIOS over IPX/SPX

NBT

SMB can also run directly on the TCP/IP protocols

Introduced with Windows 2000.

History

About when Sun Microsystems announced

WebNFS

Microsoft launched an initiative in 1996 to rename SMB to

Common Internet File System (CIFS)

Added more features, including

Support for symbolic links, hard links, larger file sizes-and an attempt at supporting direct connections without all the

NetBIOS trimmings

Largely experimental effort that required further refinement

Microsoft submitted some partial specifications as

Internet-Drafts to the IETF

These submissions have expired

History

Windows Vista introduced Server Message

Block 2.0

History

Side issue:

Samba project originated aim of

Reverse engineer SMB

Providing a free implementation of a compatible

SMB-client and server for use with non-Microsoft operating system

Driven by the importance of the SMB protocol in interacting with the widespread Microsoft

Windows platform

Implementation

Implementation

Client-server approach

SMB works through a peer-to-peer approach

Client makes specific requests and the server responds accordingly

One section of the SMB protocol specifically deals with access to filesystems, such that clients may make requests to a file server

Other sections of the SMB protocol specialize in interprocess communication (IPC)

Developers have optimized the SMB protocol for local subnet usage

Users have also put SMB to work to access different subnets across the Internet

 Involving file-sharing or print-sharing in MS Windows environments

Implementation

Client-server approach

SMB servers make their file systems and other resources available to clients on the network

Client computers may want access to the shared file systems and printers on the server

 For this primary function, SMB is best-known and most heavily used

SMB file-server aspect would count for little without the NT domains suite of protocols

 Provide NT-style domain-based authentication

NT Domains protocols offer MSRPC services available almost exclusively available on SMB IPC "named pipe“

 Almost all implementations of SMB servers use NT Domain authentication to validate user-access to resources

Implementation

Performance issues

Many people believe that the SMB protocol makes heavy use of network bandwidth because each client broadcasts its presence to the whole subnet.

SMB itself does not use broadcasts

 Broadcast problems commonly associated with SMB actually originate with the

NetBIOS service location protocol

Microsoft Windows server will use NetBIOS to advertise and locate services

NetBIOS functions by broadcasting services available on a particular host at regular intervals.

Usually ok for an acceptable default in a network with fewer than 20 hosts

 Broadcast traffic will cause problems as the number of hosts increases.

Proper implementation of a NetBIOS Name Server (NBNS) can mitigate this problem

 Windows Internet Naming Service (WINS) offers a suitable solution in Microsoft environments.

WINS uses a much more advanced system of registration and centralized service requests

 Imposes its own complexity upon the design and maintenance of the network

Microsoft recommends the use of Dynamic DNS

 In Microsoft Active Directory environments

Implementation

Performance issues

Network designers should expect that network latency will have a significant impact on the performance of the SMB protocol

Monitoring reveals this mostly due to navigating among directories through SMB

 When significant network latency exists between hosts

For example, a VPN connection over the Internet will often introduce network latency

 Can make for a frustrating experience

Implementation

Microsoft's modifications

Microsoft has added several extensions to its own SMB implementation.

Added NTLM Version 2

 NTLM version 1 used DES in a flawed manner

Derived from the original legacy SMB specification's requirement to use IBM "LanManager" passwords

NT 4.0 Domain Logon protocols use 40-bit encryption outside of the United States of America

 Does not conform with modern security standards

Implementation

Points of interest

SMB's "Inter-Process Communication" mechanism deserves a specific mention

SMB "IPC" system provides named pipes

 Provides one of the first few inter-process mechanisms commonly available to programmers

Provides a means for services to inherit the authentication carried out when a client first connected to an SMB server

 Inherited authentication in named pipes has become so ubiquitous and transparent that both Windows-users and programmers who use the Windows API often simply take it for granted

Implementation

Points of interest

Some services that operate over named pipes also allow MSRPC client programs to perform authentication

Over-rides the authorization provided by the SMB server

Only in the context of the MSRPC client program that successfully makes the additional authentication

Implementation

Points of interest

As another point of interest, SMB uses opportunistic locking

Special type of locking-mechanism

To improve performance

SMB serves as the basis for Microsoft's

Distributed File System implementation

Versions and implementations

Versions and implementations

The following list explicitly refers to "SMB" as including an SMB client or an SMB server

Including the various protocols that extend SMB

For simplicity and conciseness and vagueness, however, the list omits mention of the extent or completeness of the reimplementation or porting status for any of these implementations

Lumps them all together simply as "SMB"

Versions and implementations

List…

Samba , a free re-implementation of the SMB protocol and of the Microsoft extensions to it, includes an SMB server and a command-line SMB client.

Samba TNG : a fork of Samba.

The Linux kernel includes two SMB client implementations that use the

Linux VFS , providing access to files on an SMB server through the standard file system API : smbfs and cifs.

ONStor Inc.

offers an SMB implementation that also supports NFS protocol so users can access the same data through both protocols.

Novell NetWare version 6 and newer has a CIFS server implementation providing access to NetWare volumes for Microsoft Network clients.

FreeBSD includes an SMB client implementation called smbfs that uses its

VFS .

NetBSD and Mac OS X include SMB client implementations called smbfs, originally derived from the FreeBSD smbfs; they use the NetBSD and OS X

VFS.

Solaris has a project called CIFS client for Solaris, based on the Mac OS X smbfs.

Sun Microsystems added in-kernel CIFS support to Solaris in October 2007

Sun Microsystems Cascade , which became known as PC-Netlink , represents a port of Advanced Server for Unix . Sun took over two years making the code useful, due to the poor quality of the original port.

Versions and implementations

Continued…

FreeNAS , a dedicated small-sized NAS server, runs FreeBSD for Network-attached storage (NAS) services, and supports protocols including CIFS/Samba

Advanced Server for Unix (AS/U) comprises a port of Windows NT 3.51's SMB server code to Unix . Microsoft licensed the code to AT&T , which then licensed it to major

Unix vendors. The poor quality of the original port (allegedly[ citation needed ] carried out by Microsoft itself) has caused any vendor sub-licensing it significant grief.[ citation needed ]

VERITAS Software has an implementation of SMB.

SCO has a port of Advanced Server for Unix .

SCO also has VisionFS , a Microsoft-independent re-implementation of SMB developed through reference to Samba source code.[ citation needed ]

Network Appliance has an SMB server implementation

Objective Development 's Sharity provides an SMB file-system client for Unix.

The Alfresco content-management system includes a Java implementation of SMB

JCIFS offers an implementation of SMB in Java

RTSMB , a CIFS/SMB implementation written in ANSI C. EBS designed RTSMB from scratch, independently of MS or SAMBA design reference, to run in embedded devices .

Visuality Systems NQ CIFS , a CIFS (SMB) server and client solution for embedded devices — ported to many popular real-time operating systems (RTOSs)

Thursby Software Systems offers a commercial implementation of SMB/CIFS for Mac

OS.

Download