Lecture 1.3. File Exchange in Clusters - Standards for high-speed and fault-tolerant RAID disk arrays - Network file systems on the example of NFS 1 Clusters need fast and reliable tools for file exchange The performance of a cluster, like any other computer, depends not only on the characteristics of its processor and communication between processors (interconnect). Equally important is the organization of I / O systems, which are often given insufficient attention when building highperformance computer systems. Many modern tasks require the storage of large amounts of data with quick access to them. Improving the reliability of data storage is also important. 2 Levels of RAID (Redundant Array of Independent Disks) RAID – data storage technology; implemented at the software or (more often) at the hardware level. The main ideas: - alternation of disks in the array; - adding redundant information, which in case of disk failure will allow you to recover information. 3 Types (Levels) of RAID-arrays RAID 0 - an array of increased performance and less fault tolerance RAID 1 - a mirrored disk array RAID 2 - reserved for the use of Heming codes RAID 3 and RAID 4 - arrays of disks with a dedicated parity disk RAID 5 - an array of interleaved disks and "unallocated" parity disk RAID 6 is an array of disks with alternating and two independent parities Combined arrays: RAID 1+0 – RAID 0, built of arrays RAID 1 RAID 5+0 – RAID 0, built of arrays RAID 5 RAID 6+0 – RAID 0, built of arrays RAID 6 4 RAID 0 Striping – «чергування» A disk array of two or more HDDs with no redundancy. Information is divided into data blocks (Ai) and written to both/several disks simultaneously. Advantages Productivity increases significantly. (The increase in performance is almost proportional to the number of disks). Disadvantages The reliability of RAID 0 is significantly lower in comparison with the reliability of one of the drives and drops as the number of drives increases, because the failure of any one of the HDDs causes the entire array to fail. 5 RAID 1 Mirroring – «дзеркалювання» Provides an acceptable write speed and a gain in read speed when parallelizing queries. Has high reliability – it works as long as at least one disk in the array is functioning. The probability of failure of two disks at once is equal to the product of the probabilities of failure of each disk. In practice, if one of the disks fails, urgent measures should be taken - to restore redundancy again. To do this, with any RAID level (except zero), it is recommended to use hot spare disks. Advantages The advantage of this approach is high availability. Disadvantages You have to pay the cost of two hard drives, getting the availaable capacity of one hard drive (the classic case when the array consists of two drives). 6 RAID 2 Using Hamming codes In arrays of this type, disks are divided into two groups - for data and for error correction codes (Hamming code). If data is stored on n disks, then n - 1 disk is needed to store correction codes. Data is written to the corresponding disks in the same way as in RAID 0, i.e. it is divided into small blocks according to the number of disks intended for storing information. The remaining disks store error correction codes, which allow recovery of information if any hard disk fails. Disadvantage Almost double the number of disks is required. Therefore, this type of RAID arrays did not become widespread. 7 RAID 3 In a RAID 3 array of n disks, the data is divided into 1-byte blocks and distributed across n − 1 disks. Another disk will be used to store parity blocks. In RAID 2, n − 1 disks were used for this purpose, but most of the information on the control disks was used for error correction on the fly, while most users are satisfied with simple recovery of information in case of disk failure, for which the information contained on one dedicated hard disk. Advantages High speed of reading and writing data; the minimum number of disks to create an array is three. Disadvantages An array of this type is good only for single-tasking work with large files. For small block sizes, the access time is much longer than the read time. A large load on the control disk, and, as a result, its reliability drops significantly compared to disks that store data. 8 RAID 4 RAID 4 is similar to RAID 3, but differs from it in that data is divided into blocks instead of bytes. Thus, it was possible to partially overcome the problem of low data transfer speed of a small volume. Writing is slow due to the fact that the parity for the block is generated during writing and written to a single disk. RAID-4 is used, for example, on NetApp devices (NetApp FAS), where its shortcomings are successfully eliminated due to the operation of disks in a special group recording mode provided by the WAFL file system. 9 RAID 5 In RAID 5, data blocks and checksums are gradually distributed over all disks of the array, as a result, there is no asymmetry of disk configuration. Checksums mean the result of the XOR operation. Advantages The main drawback of RAID 0 ... RAID 4 levels is eliminated – the inability to perform parallel write operations, since a separate control disk is used to store parity information. It also increases efficiency. Disadvantage RAID 5 performance degrades, especially on random write operations, as it requires more disk operations. However, the advantages prevail, which is why this type of RAID arrays has become widely used. 10 RAID 6 RAID 6 - similar to RAID 5, but has a higher degree of reliability - the capacity of 2 disks is allocated for checksums, 2 sums are calculated using different algorithms. Requires a more powerful RAID controller. Provides operability after the simultaneous failure of two drives - protection against multiple failure. A minimum of 4 disks is required to organize an array. Typically, using RAID-6 causes about a 10-15% drop in disk group performance compared to RAID-5, which is caused by a large amount of processing for the controller (the need to calculate a second checksum, as well as read and rewrite more disk blocks on each write). block). 11 12 Summary table of RAID arrays 13 Clusters need distributed file systems One of the features of running parallel programs (in particular, using the MPI standard) is the need to have copies of the program on all nodes of the cluster on which it runs. For example, if the program myprog is located in the folder /home/mpiuser/, then this folder must be present on all nodes of the cluster, and the program must be located in it. This condition requires the need to somehow distribute copies of the executable program module between cluster nodes. A similar requirement applies to the data stored on disk that the program will use. 14 Network File Systems (NFS) NFS is designed to work in a heterogeneous environment of different machines, OS and network architecture; the NFS specification does not depend on them. This independence is achieved through the use of RPC (Remote Procedure Call) primitives implemented over the External Data Representation (XDR) protocol. A distinction must be made between NFS specifications for software services (which provide mounting mechanisms) and actual remote file systems. 15 Three independent file systems 16 NFS Mounting Mounts Cascading mounts 17 NFS Architecture 1818 Three Main Levels of NFS Architecture • UNIX file system interface (based on open, read, write and close calls and file descriptors). • Level of Virtual File System (VFS) - distinguishes between local and deleted files, and then local files are processed according to the types of their file systems. - VFS activates file-specific operations to process local requests according to file system types. -Invokes NFS protocol procedures for deleted requests. • NFS - service level - lower level of architecture; implements the NFS protocol. 19 NFS Protocol • Network File System (NFS) – a network file system access protocol originally developed by Sun Microsystems in 1984. Based on the remote procedure call protocol (ONC RPC, Open Network Computing Remote Procedure Call, RFC 1057, RFC 1831). Allows you to connect (mount) remote file systems over the network, described in RFC 1094, RFC 1813, RFC 3530 and RFC 5661. • The NFS protocol is abstracted from both server and client file system types. • There are many implementations of NFS servers and clients for different operating systems and hardware architectures. • The most mature version is currently in use NFS v.4 (RFC 3010, RFC 3530), which supports various means of authentication (specifically Kerberos and LIPKEY using the RPCSEC GSS protocol) and ACLs (both POSIX and Windows). 20 NFS Protocol Properties • NFS provides clients with transparent access to files and the server's file system. Any client program that can work with a local file can work with the same success with an NFS file, without any modifications to the program itself. • NFS clients access files on the NFS server by sending RPC requests to the server. This can be implemented using common user processes - namely, an NFS client can be a user process that makes specific RPC calls to a server, which can also be a user process. • An important part of the latest version of the NFS standard (v4.1) is the pNFS specification, which aims to provide a parallel implementation of file sharing, which increases the data rate in proportion to the size and degree of parallelism of the system. 21 Practical Properties of the NFS Protocol NFS development had the following goals: 1. NFS should not be limited to the UNIX operating system. Any operating system must be able to implement an NFS server and client. 2. The protocol should not depend on hardware. 3. Simple recovery mechanisms should be implemented in case of server or client failure. 4. Programs must have transparent access to remote files without the use of special path names or libraries and without recompilation. 5. UNIX semantics must be supported for UNIX clients. 6. NFS performance should be comparable to local disk performance. 7. Implementation should be transportation-dependent. 22 NFS Application – an Implementation of the NFS Protocol There are a large number of NFS implementations, both paid and free, that are distributed under the GPL. A software implementation of a network file system called NFS is available in any general-purpose Linux distribution. The NFS software product consists of two components: a server and a client. The server provides network access to the directories of the base file system based on certain rules of access restriction. The client is used to connect to available resources. 23 Thank you! Serhii Yakovych Hilgurt (Сергій Якович Гільгурт) hilgurt@ukr.net, hilhurt_sy@npp.nau.edu.ua (066)756-43-48 (Viber, Telegram) G.E. Pukhov Institute for Modelling in Energy Enginering of NAS of Ukraine (Інститут проблем моделювання в енергетиці ім. Г.Є. Пухова НАН України) http://ipme.kiev.ua 24