Chapter 6 – Distributed Processing and File Systems Aims: Contrast distributed processing with centralised processing. Outline methods used to synchronise events between processes. Outline typical implementations of remote processing, especially using RPC. Define the strengths of distributed file systems. Outline different implementation methods for distributed file systems. Distributed processing • Using specialized resources, which would not normally be accessible from a local computer, such as enhanced processing or increased amount of memory storage. • Using parallel processing, where a problem is split into a number of parallel tasks, which are distributed over the network. • Reducing the loading on the local computer, as tasks can be processed on remote computers. Distributed Processing Remote processing Client requests a remote process and passes process parameters Network Server runs process and returns the results to the client Interprocess communications • Pipes. Pipes allow data to flow from one process to another, and have a common process origin. • Named pipe. A named pipe uses a pipe which has a specific name for the pipe. • Message queuing. Message queues allow processes to pass messages between themselves, using either a single message queue or several message queues. • Semaphores. These are used to synchronize events between processes. • Shared memory. Shared memory allows processes to interchange data through a defined area of memory. • Sockets. These are typically used to communicate over a network, between a client and a server (although peer-to-peer connections are also possible). Connection over a network Process Process AA Process Process BB Process Process AA Gets access Resource Resource Sleep until to resource ready and increments a semaphore (wait) Process Process AA Process Process AA Shared Shared memory memory Process A | Process B Process Process BB Socket Semaphores Shared memory Process Process BB Process Process BB Pipe Semaphores There are only two operations on the semaphore: • UP (signal). Increments the semaphore value, and, if necessary, wakes up a process which is waiting on the semaphore. This is achieved in a single operation, to avoid conflicts. • DOWN (wait). Decrements the semaphore value. If the counter is zero there is no decrement. Processes are blocked until the counter is greater than zero. Semaphore Process A Semaphore 11 Wait decrements the semaphore wait wait (); (); 00 code codethat thatmust mustbe be mutually exclusive mutually exclusive signal signal (); (); Process B will go to sleep as the semaphore has a zero value Process B wait wait (); (); 11 Signal increments the semaphore Process B will wake up when the semaphore value becomes a non -zero code codethat thatmust mustbe be mutually exclusive mutually exclusive signal signal (); (); Semaphore V I W S where I is the initial value of the semaphore. W is the number of completed wait operations performed on the semaphore. S is the number of signal operations performed on it. V is the current value of the semaphore (which must be greater than or equal to zero). Example of deadlock #define MAX_BUFF 100 /* maximum items in buffer */ int buffer_count=0; /* current number of items in buffer */ int main(void) { /* producer_buffer(); on the producer */ /* consumer_buffer(); on the consumer */ } void producer_buffer(void) { while (TRUE){ /* Infinite loop */ put_item(); /* Put item*/ if (buffer_count==MAX_BUFF) sleep();/* Sleep, if buffer full */ enter_item(); /* Add item to buffer*/ buffer_count = buffer_count + 1; /*Increment number of items in the buffer */ if (buffer_count==1) wakeup(consumer); /*was buffer empty?*/ } } Example of deadlock (cont.) void consumer_buffer(void) { while (TRUE) { /* Infinite loop */ if (buffer_count==0) sleep(); /* Sleep, if buffer empty */ get_item(); /* Get item */ buffer_count = buffer_count - 1; /* Decrement number of items in the buffer*/ if (buffer_count==MAX_BUFF-1) wakeup(producer_buffer); /* if buffer not full anymore, wake up producer*/ consume_item(); /*remove item*/ } } Deadlock • Resource locking. This is where a process is waiting for a resource which will never become available. Some resources are preemptive, where processes can release their access on them, and give other processes a chance to access them. • Starvation. This is where other processes are run, and the deadlocked process is not given enough time to catch the required event. Deadlock example C B A D E F Four conditions for deadlock Mutual exclusion condition. This is where processes get exclusive control of required resources, and will not yield the resource to any other process. Wait for condition. This is where processes keep exclusive control of acquired resources while waiting for additional resources. No preemption condition. This is where resources cannot be removed from the processes which have gained them, until they have completed their access on them. Circular wait condition. This is a circular chain of processes on which each process holds one or more resources that are requested by the next process in the chain. C B A D E F Deadlock avoidance – Bankers algorithm • • • Each resource has exclusive access to resources that have been granted to it. Allocation is only granted if there is enough allocation left for at least one process to complete, and release its allocated resources. Processes which have a rejection on a requested resource must wait until some resources have been released, and that the allocated resource must stay in the safe Its problems include: • • • • • Requires processes to define their maximum resource requirement. Requires the system to define the maximum amount of a resource. Requires a maximum amount of processes. Requires that processes return their resources in a finite time. Processes must wait for allocations to become available. A slow process may stop many other processes from running as it hogs the allocation. RPC • • • • Servers. This is software which implements the network services. Services. This is a collection of one or more remote programs. Programs. These implement one or more remote procedures. Procedures. These define the procedures, the parameters and the results of the RPC operation. • Clients. This is the software that initiates remote procedure calls to services. • Versions. This allows servers to implement different versions of the RPC software, in order to support previous versions. Protocol stack Application Application program program Remote Remote process process Session layer (RPC) supports the running of remote processes and passing run parameters and results Transport layer sets up a virtual connection, and streams data Network layer responsible for the routing data over the network and delivering it at the destination Application program Application Application Presentation Presentation Session Session RPC Transport Transport TCP/IP UDP/IP Network Network Network Data link Data DataLink Link Physical Physical Ethernet/ISDN/ FDDI/ATM/etc RPC operation Client The Thecaller callerprocess process sends a sends acall callmessage, message, with all the with all the procedure’s procedure’s parameters parameters Server Server Serverprocess process waits for waits foraacall call Process, and parameters Server Serverreads reads parameters parametersand andruns runs the process the process Caller Callerprocess processwaits waits for foraaresponse response The Thecaller callerprocess process sends a call message, sends a call message, with withallallthe the procedure’s procedure’s parameters parameters Server Serversends sendsresults results to the client to the client Results Server Serverprocess process waits for waits foraacall call Distributed file systems Administration services Mounted as a local drive Localized file storage (rather than accessing a remote file) Network Distributed databases Networked file system (NFS) Centralized configuration (passwords, user IDs, and so on) Distributed file system • File system mirrors the corporate structure. File systems can be distributed over a corporate network, which might span cities, countries or even continents. • Easier to protect the access rights on file systems. In a distributed file system it is typical to have a strong security policy on the file system, and each file will have an owner who can define the privileges on this file. • Increased access to single sources of information. Many users can have access to a single source of information. • Automated updates. Several copies of the same information can be stored, and when any one of them is updated they are synchronized to keep each of them up-to-date. • Improved backup facilities. A user’s computer can be switched-off, but their files can still be backed up from the distributed file system. Distributed file systems (cont.) • • • • • Increased reliability. The distributed file system can have a backbone which is constructed from reliable and robust hardware, which are virtually 100% reliable, even when there is a power failure, or when there is a hardware fault. Larger file systems. In some types of distributed file systems it is possible to build-up large file systems from a network of connected disk drives. Easier to administer. Administrators can easily view the complete file system. Interlinking of databases. Small databases can be linked together to create large databases, which can be configured for a given application. The future may also bring the concept of data mining, where agent programs will search for information with a given profile by interrogating databases on the Internet. Limiting file access. Organizations can setup an organization file structure, in which users can have a limited view of the complete file system. Traditional v. corporate structure \\ users users orgname orgname config config sales sales progs progs fred fred production production research research UK UKOffice Office bert bert US USOffice Office Distributed file system Single tree Global Filesystem file system /etc Drives mounted over the network to create a single tree /progs /user /sys Networ Network Networ kk Application C: Forest of drives E: D: F: Drives mounted over the network to a forest of drives NFS NIS Presentation XDR Session RPC Transport TCP Network IP Data link Physical Ethernet/ Token Ring RPC procedures and responses NFS server Remotely accessed file system RPC procedures getattr, setattr, read, write, create, remove, rename, link, symlink, mkdir, rmdir, readdir File system either mounted onto a single tree or as a forest of drives Network Network RPC response Requested data, parameters or status flag (such as: NFS_OK and NFSERR_PERM) NFS client NIS domains #/etc/protocols #/etc/protocols ip 0 ip 0 icmp 1 icmp 1 ggp 3 ggp 3 tcp 6 tcp 6 Master NIS server maintains: /etc/passwd Domain passwords /etc/groups Domain groups /etc/hosts IP addresses and host names /etc/rpc RPC processes /etc/network Used to map IP address to networks /etc/protocols Known network layer protocols /etc/services Known transport layer protocols IP IP ICMP ICMP GGP GGP TCP TCP Server #/etc/groups #/etc/groups root::0:root root::0:root other::1:root,hpdb other::1:root,hpdb bin::2:root,bin bin::2:root,bin sys::3:root,uucp sys::3:root,uucp freds_grp::4:fred,fred2,fred3 freds_grp::4:fred,fred2,fred3 Clients NIS NIS Domain Domain #/etc/rpc #/etc/rpc portmapper portmapper rstatd rstatd rusersd rusersd nfs nfs ypserv ypserv 100000 100000 100001 100001 100002 100002 100003 100003 100004 100004 portmap sunrpc portmap sunrpc rstat rstat_svc rstat rstat_svc rusers rusers nfsprog nfsprog ypprog ypprog #/etc/hosts #/etc/hosts 138.38.32.45 138.38.32.45 198.4.6.3 198.4.6.3 193.63.76.2 193.63.76.2 148.88.8.84 148.88.8.84 146.176.2.3 146.176.2.3 bath bath compuserve compuserve niss niss hensa hensa janet janet #/etc/passwd #/etc/passwd root:FDEc6.32:1:0:Super unser:/user:/bin/csh root:FDEc6.32:1:0:Super unser:/user:/bin/csh fred:jt.06hLdiSDaA:2:4:Fred Blogs:/user/fred:/bin/csh fred:jt.06hLdiSDaA:2:4:Fred Blogs:/user/fred:/bin/csh fred2:jtY067SdiSFaA:3:4:Fred Smith:/user/fred2:/bin/csh fred2:jtY067SdiSFaA:3:4:Fred Smith:/user/fred2:/bin/csh #/etc/services #/etc/services ftp 21/tcp ftp 21/tcp telnet 23/tcp telnet 23/tcp smtp 25/tcp smtp 25/tcp pop3 110/tcp pop3 110/tcp #/etc/networks #/etc/networks loopback 127.0.0.0 loopback 127.0.0.0 localnet 146.176.151.0 localnet 146.176.151.0 Production 146.176.142.0 Production 146.176.142.0 NIS domains Master NIS Server maintains: /etc/passwd /etc/groups /etc/hosts /etc/rpc /etc/network /etc/protocols /etc/services and so on. Master sends updates to NIS slaves NIS NIS Domain Domain 3. The client then binds to the first server which responds Slave NIS server Slave NIS server 2. Client broadcasts an NIS request to the domain 1. Client is started NIS client