UNIT - I LESSON – 1: INTRODUCTION TO DOS AND UNIX CONTENTS 1.1 Aims and Objectives 1.2 Introduction 1.3 History of DOS 1.3.1 Definition of DOS 1.3.2 History and versions 1.3.3 DOS Environment 1.4 History of Unix 1.5 Let us sum up 1.6 Points for Discussion 1.7 Model answers to “Check your Progress” 1.8 Lesson - end activities 1.9 References 1.1 Aims and Objectives Computer software can be divided into two main categories: application software and system software. Application software consists of the programs for performing tasks particular to the machine's utilization. Examples of application software include spreadsheets, database systems, desktop publishing systems, program development software, and games." The most important type of system software is the operating system. An operating system has three main responsibilities: i) Perform basic tasks, such as recognizing input from the keyboard, sending output to the display screen, keeping track of files and directories on the disk, and controlling peripheral devices such as disk drives and printers. ii) Ensure that different programs and users running at the same time do not interfere with each other. iii) Provide a software platform on top of which other programs (i.e., application software) can run. In this lesson we will learn the basics, history, versions and environment of DOS. This lesson also covers the basic definitions and history of UNIX. The objectives of this lesson is to make the student aware of the basic concepts and history of both DOS and UNIX 1 1.2 Introduction The functions and working of operating system can be better understood by the beginners, if we start with an example which is analogous with the operating system. Let us take an example of an office environment as follows. Consider the office in which you work. There is likely to be a workspace (desk), storage space (filing cabinets, drawers, etc.), and tools (pencils, pens, rulers, calculators, etc.). The layout of the room and the laws of physics (thank you, Newton) dictate how we can accomplish tasks within this office. If we want to work on a file, we must go to a drawer, pull on the drawer to open it, open the folder where the document is, grab the document, go to the desk, then you get the idea. On a computer, how you work is defined not by the laws of physics, but by the rules dictated by your computer's hardware and software. The software aspect of those rules is what we call an operating system. So in effect, the programmer (actually a series of teams of many programmers and designers) has created a new environment for working. In order to do something in that environment, you must follow the rules of this new environment. In the physical world, we must follow Newton's laws in order to accomplish any task. Likewise on a computer, we must follow the programmer's laws; the only difference is that Newton did not decide what these rules would be! The basic functions of an operating system are as follows: File management--analogous to the file cabinet and its use Working with the Files--analogous to picking up and preparing to use a calculator or some other tool Configuration of your working environment--analogous to shifting your desk around to suit you better 1.3 History of DOS 1.3.1 Definition of DOS DOS (an acronym for Disk Operation System) is a tool which allows you to control the operation of the Personal Computer. DOS is a software, which was written to control hardware. DOS can be used for a wide range of tasks. You will be able to manage well if you master only a small subset of DOS commands and functions. The environment provided by DOS is to give the user quick and direct access to the basic utility of the computer. All tasks are accomplished by typing in commands at a command prompt (at the cursor). The operating system offers a hardware abstraction layer that allows development of character-based applications, but not for accessing most of the hardware, such as graphics cards, printers, or mice. This required programmers to access the hardware directly, resulting in each application having its own set of device drivers for each hardware peripheral. Hardware manufacturers would release specifications to ensure device drivers for popular applications were available. 2 1.3.2 History and Versions A full time operating system was needed for IBM's 8086 line of computers, but negotiations for the use of CP/M on these broke down. IBM approached Microsoft's CEO, Bill Gates, who purchased QDOS from SCP allegedly for $50,000. This became Microsoft Disk Operating System, MS-DOS. Microsoft also licensed their system to multiple computer companies, who used their own names. Eventually, Microsoft would require the use of the MS -DOS name, with the exception of the IBM variant, which would continue to be developed concurrently and sold as PC-DOS (this was for IBM's new 'PC' using the 8088 CPU (internally the same as the 8086)). Early versions of Microsoft Windows were little more than a graphical shell for DOS, and later versions of Windows were tightly integrated with MS-DOS. It is also possible to run DOS programs under OS/2 and Linux using virtual-machine emulators. Because of the long existence and ubiquity of DOS in the world of the PC-compatible platform, DOS was often considered to be the native operating system of the PC compatible platform. There are alternative versions of DOS, such as FreeDOS and OpenDOS. FreeDOS appeared in 1994 due to Microsoft Windows 95, which differed from Windows 3.11 MS-DOS (and the IBM PC-DOS which was licensed therefrom), and its predecessor, 86-DOS, was inspired by CP/M (Control Program / (for) Microcomputers) — which was the dominant disk operating system for 8-bit Intel 8080 and Zilog Z80 based microcomputers. Tim Paterson at Seattle Computer Products developed a variant of CP/M-80, intended as an internal product for testing the SCP's new 8086 CPU card for the S-100 bus. It did not run on the 8080 CPU needed for CP/M-80. The system was named 86-DOS (it had initially been called QDOS, which stood for Quick and Dirty Operating System). Digital Research would attempt to regain the market with DR-DOS, an MS-DOS and CP/M hybrid. Digital Research would later be bought out by Novell, and DR DOS became Novell DOS 7. DR DOS would later be part of Caldera (as OpenDOS), Lineo (as DR DOS), and DeviceLogics. 1.3.3 DOS Working Environment This subsection will give you a general understanding about the command prompt, directory, Directory, Working with the files, File naming conventions, Viewing, Editing, Executing, Stop Execution, Printing, Backup files, and Rebooting a) Command Prompt If we take a look at the computer screen, we are likely to see a blank screen with the exception of a few lines, at least one of which begins with a capital letter followed by a colon and a backslash and ends with a greater-than symbol (>): C:\> Any line in DOS that begins like this is a command prompt. This line prompt is the main way users know where they are in DOS. Here is how: 3 The C: tells the user that he/she is working within the filespace (disk storage) on the hard drive given the designation C. C is usually reserved for the internal hard disk of a PC. The backslash (\) represents a level in the hierarchy of the file structure. There is always at least one because it represents the root directory, the very first level of your hard disk. In a graphical representation of the file structure below, you can see how a file can be stored in different levels on a hard disk. Folder icons represent a directory, and In DOS, the same file, SAMPLE, is the document icons represent actual represented this way: files in those directories. C:\DEMO\DOS&WIN\SAMPLES\SAMPLE or, to mimic a graphic representation, C:\ DEMO\ DOS&WIN\ SAMPLES\ SAMPLE So what C:\DEMO\DOS&WIN\SAMPLES\SAMPLE means is that the file SAMPLE is on the internal hard disk, four levels deep (inside several nested directories). The list of directories (\DEMO\DOS&WIN\SAMPLES\) is referred to as a pathname (following the path of directories will get you to the file). The name of the file itself (SAMPLE) is referred to as the filename. b) Directory If you need more help in orienting yourself, it sometimes helps to take a look at the files and directories available where you are by using the DIR command. C:\>dir This will give you a listing of all the files and directories contained in the current directory in addition to some information about the directory itself. You will see the word volume in this information. Volume is simply another word for a disk that the computer has access to. Your hard disk is a volume, your floppy disk is a volume, a server disk (hard disk served over a network) is a volume. Now you know fancy words for all the parts of the format DOS uses to represent a file. Volume: C: Pathname: \DEMO\DOS&WIN\SAMPLES\ Filename: SAMPLE Here are some helpful extensions of the DIR command: 4 C:\>dir | more (will display the directory one screen at a time with a < more> prompt--use control-C to escape) C:\>dir /w (wide: will display the directory in columns across the screen) C:\>dir /a (all: will display the directory including hidden files and directories) c) Working with the files Understanding how to manage your files on your disk is not the same as being able to use them. In DOS (and most operating systems), there are only two kinds of files, binary files and text files (ASCII). Text files are basic files that can be read or viewed very easily. Binary files, on the other hand, are not easily viewed. As a matter of fact, most binary files are not to be viewed but to be executed. When you try to view these binary files (such as with a text editor), your screen is filled with garbage and you may even hear beeps. While there are only two kinds of files, it is often difficult to know which kind a particular file is. For files can have any extension! Fortunately, there is a small set of extensions that have standard meanings like .txt, .bat, and .dat for text files and .exe and .com for binary executables. d) File naming conventions Careful file naming can save time. Always choose names which provide a clue to the file's contents. If you are working with a series of related files, use a number somewhere in the name to indicate which version you have created. This applies only to the filename parameter; most of the file extension parameters you will be using are predetermined or reserved by DOS for certain types of file. For example, data1.dat, employee.dat e) Editing You can view any text file using text editor. For example to open a file named ‘employee.txt’ in the ‘work’ directory of the ‘C’ drive C:\work> edit employee.txt f) Executing Binary files ending in .exe are usually "executed" by typing the filename as if it were a command. The following command would execute the WordPerfect application which appears on the disk directory as WP.EXE: C:\APPS\WP51>wp Binary files ending in .com often contain one or more commands for execution either through the command prompt or through some program. g) Stop Execution 5 If you wish to stop the computer in the midst of executing the current command, you may use the key sequence Ctrl-Break. Ctrl-Break does not always work with non-DOS commands. Some software packages block its action in certain situations, but it is worth trying before you re-boot. h) Rebooting In some cases, when all attempts to recover from a barrage of error messages fails, as a last resort you can reboot the computer. To do this, you press, all at once, the control, alternate and delete (CTRL+ALT+DELELTE). If you re-boot, you may loose some of your work and any data active in RAM which has not yet been saved to disk. 1.4 History of Unix In the 1960s, the Massachusetts Institute of Technology, AT&T Bell Labs, and General Electric worked on an experimental operating system called Multics (Multiplexed Information and Computing Service), which was designed to run on the GE-645 mainframe computer. The aim was the creation of a commercial product, although this was never a great success. Multics was an interactive operating system with many novel capabilities, including enhanced security. The project did develop production releases, but initially these releases performed poorly. AT&T Bell Labs pulled out and deployed its resources elsewhere. One of the developers on the Bell Labs team, Ken Thompson, continued to develop for the GE-645 mainframe, and wrote a game for that computer called Space Travel. However, he found that the game was too slow on the GE machine and was expensive, costing $75 per execution in scarce computing time. Thompson thus re-wrote the game in assembly language for Digital Equipment Corporation's PDP-7 with help from Dennis Ritchie. This experience, combined with his work on the Multics project, led Thompson to start a new operating system for the PDP-7. Thompson and Ritchie led a team of developers, including Rudd Canady, at Bell Labs developing a file system as well as the new multi-tasking operating system itself. They included a command line interpreter and some small utility programs. These developments later emerged as a fully functional UNIX operating system. A short history of the development process is given the following table. Year 1970s 1973s 1975s 1978s 1979s 1980s Innovations Unics - support two simultaneous users Multics – invented by Brian Kernighan – later changed to Unix – works on PDP-7 machine and later on PDP – 11/20 Written in Assembly language UNIX – rewritten in C language – more concise and compact Release of Version 5,6 and 7 - added the concept of pipes Release of PWB/UNIX, IS/1 (the first commercial Unix) Release of Interdata 7/32 (the first non-PDP Unix) Release of UNIX/32V, for the VAX system Development of Unix versions 8,9 & 10 Development of X windows system – a graphical user interface for Unix Release of Unix System V BSD Unix by Berkeley researchers – contains TCP/IP network code – Accompanying Berkeley Sockets API is a de facto standard for networking APIs 6 1982s 1984s 1987-89 1990s 1991s 1993 2000 present to Release of Xenix – Microsoft’s first Unix for 16-bit microcomputers SunOS (now Solaris) by Sun Microsystems The most successful Unix-related standard turned out to be the IEEE's POSIX specification, designed as a compromise API readily implemented on both BSD and System V platforms, published in 1988 and soon mandated by the United States government for many of its own systems. SCO Unix – developed from Xenix for Intel 8086 AT&T added various features into UNIX System V, such as file locking, system administration, streams, new forms of IPC, the Remote File System and TLI. AT&T cooperated with Sun Microsystems and between 1987 and 1989 merged features from Xenix, BSD, SunOS, and System V into System V Release 4 (SVR4), independently of X/Open. The Common Desktop Environment or CDE, a graphical desktop for UNIX co-developed by HP, IBM, and Sun as part of the COSE initiative. Open Software Foundation released OSF/1, their standard UNIX implementation, based on Mach and BSD. Free distribution of FreeBSD, OpenBSD, and NetBSD Novell developed its own version, UnixWare, merging its NetWare with UNIX System V Release 4 In 2005, Sun Microsystems released open source project called OpenSolaris(based on UNIX System V Release 4) Release of other distributions like SchilliX, Belenix, Nexenta, MarTux, SGI’s IRIX. Only Solaris, HP-UX, and AIX are still doing relatively well in the market. 1.5 Let Us Sum Up In this lesson we have learned about a) the development history of DOS b) the various version of DOS c) the working environment of DOS d) History and development of UNIX 1.6 Points for Discussion a) Discuss about the various functionalities of operating system b) Discuss about the future of DOS 1.7 Model answers to “Check your Progress” An operating system (OS) is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system. At the foundation of all system software, an operating system performs basic tasks such as controlling and allocating memory, prioritizing system requests, controlling input and output devices, facilitating networking and managing file systems. Most operating systems come with an application that provides a user interface for managing the operating system, such as a command line interpreter or graphical user interface. The operating system forms a platform for other system software and for application software. MS-DOS has effectively ceased to exist as a platform for desktop computing. Since the releases of Windows 9x, it was integrated as a full product mostly used for bootstrapping, and no longer officially released as a standalone DOS. It was still available, but became increasingly irrelevant as development shifted to the Windows API. Windows XP contains a 7 copy of the core MS-DOS 8 files from Windows Millennium, accessible only by formatting a floppy as an "MS-DOS startup disk". Attempting to run COMMAND.COM from such a disk under the NTVDM results in the message "Incorrect MS-DOS version". With Windows Vista the files on the startup disk are dated 18th April 2005 but are otherwise unchanged, including the string "MS-DOS Version 8 (C) Copyright 1981-1999 Microsoft Corp" inside COMMAND.COM. Today, DOS is still used in embedded x86 systems due to its simple architecture, and minimal memory and processor requirements. The command line interpreter of Windows NT, cmd.exe maintains most of the same commands and some compatibility with DOS batch files. 1.7 Lesson-end Activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. a) Discuss about DOS operating system and its Environment b) Discuss about the UNIX Operating system and about the version of UNIX 1.8 References a) Charles Crowley, Chapter 1 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 b) H.M. Deitel, Chapter 1, 18, 19 of “Operating Systems”, Second Edition, Pearson Education, 2001 c) Andrew S. Tanenbaum, Chapter 1, 7, 8 of “Modern Operating Systems”, PHI, 1996 d) D.M. Dhamdhere, Chapter 9 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 8 9 LESSON – 2: INTRODUCTION TO PROCESS CONTENTS 2.1 Aims and Objectives 2.2 Introduction to process 2.3 Process states 2.4 Process state transitions 2.5 Operations on process 2.5.1 Suspend and resume 2.5.2 Suspending a process 2.6 Let us sum up 2.7 Points for Discussion 2.8 Model answers to “Check your Progress” 2.9 Lesson end activities 2.10 References 2.1 Aims and Objectives In this lesson we will learn about the introduction of process, various states of the process, and process transitions. The objectives of this lesson are to make the student aware of the basic concepts process and its behaviors in an operating system. 2.2 Introduction to Process Process is defined as a program in execution and is the unit of work in a modern timesharing system. Such a system consists of a collection of processes: Operating-system processes executing system code and user processes executing user code. All these processes can potentially execute concurrently, with the CPU (or CPUs) multiplexed among them. By switching the CPU between processes, the operating system can make the computer more productive. A process is more than the program code, it includes the program counter, the process stack, and the contents of process register etc. The purpose of process stack is to store temporary data, such as subroutine parameters, return address and temporary variables. All these information will be stored in Process Control Block (PCB). The Process control block is a record containing many pieces of information associated with a process including process state, program counter, cpu registers, memory management information, accounting information, I/O status information, cpu scheduling information, memory limits, and list of open files. 10 2.3 Process States When a process executes, it changes the state, generally the state of process is determine by the current activity of the process. Each process may be in one of the following states. New-------------: The process is being created. Running--------: The process is being executed Waiting---------: The process is waiting for some event to occur. Ready-----------: The process is waiting to be assigned to a processor. Terminate------: The process has finished execution. The important thing is only one process can be running in any processor at any time. But many processes may be in ready and waiting states. The ready processes are loaded in to a “ready queue”. A queue is one type of data structure. It is used here to store process. The operating system creates a process and prepares the process to be executed then the operating systems moved the process in to ready queue. When it is time to select a process to run, the operating system selects one of the jobs from the ready queue and moves the process from ready state to running state. When the execution of a process has completed then the operating system terminates that process from running state. Some times operating system terminates the process for some other reasons also which include time limit exceeded, memory unavailable, access violation, protection error, I/O failure, data misuse and so on. When the time slot of the processor expires or if the processor receives any interrupt signal, then the operating system shifts running process to ready state. For example, let process P1 be executing in the processor and in the mean time let process P2 generates an interrupt signal to the processor. The processor compares the priorities of process P1 and P2. If P1>P2 then the processor continue the process P1. Otherwise the processor switches to process P2 and the process P1 is moved to ready state. A process is put into the waiting state, if the process need an event to occur or an I/O task is to be done. For example if a process in running state need an I/O device, then the process is moved to blocked (or) waiting state. A process in the blocked (waiting) state is moved to the ready state when the event for which it has been waiting occurs. The OS maintains a ready list and a blocked list to store references to processes not running. The following figure shows the process state diagram New Ready running Terminated waiting 11 The new and terminated states are worth a bit of more explanation. The former refer to a process that has just been defined (e.g. because an user issued a command at a terminal), and for which the OS has performed the necessary housekeeping chores. The latter refers to a process whose task is not running anymore, but whose context is still being saved (e.g. because an user may want to inspect it using a debugger program). A simple way to implement this process handling model in a multiprogramming OS would be to maintain a queue (i.e. a first-in-first-out linear data structure) of processes, put at the end the queue the current process when it must be paused, and run the first process in the queue. However, it's easy to realize that this simple two-state model does not work. Given that the number of processes that an OS can manage is limited by the available resources, and that I/O events occur at much larger time scale that CPU events, it may well be the case that the first process of the queue must still wait for an I/O event before being able to restart; even worse, it may happen that most of the processes in the queue must wait for I/O. In this condition the scheduler would just waste its time shifting the queue in search of a runnable process. A solution is to split the not-running process class according to two possible conditions: processes blocked in the wait for an I/O event to occur, and processes in pause, but nonetheless ready to run when given a chance. A process would then be put from running into blocked state on account of an event wait transition, would go running to ready state due to a timeout transition, and from blocked to ready due to event occurred transition. This model would work fine if the OS had an very large amount of main memory available and none of the processes hogged too much of it, since in this case there would always be a fair number of ready processes. However, because the costs involved this scenario is hardly possible, and again the likely result is a list of blocked processes all waiting for I/O. 2.4 Process State Transition The various process states, displayed in a state diagram, with arrows indicating possible transitions between states. Processes go through various process states which determine how the process is handled by the operating system kernel. The specific implementations of these states vary in different operating systems, and the names of these states are not standardized, but the general high-level functionality is the same. When a process is created, it needs to wait for the process scheduler (of the operating system) to set its status to "waiting" and load it into main memory from secondary storage device (such as a hard disk or a CD-ROM). Once the process has been assigned to a processor by a short-term scheduler, a context switch is performed (loading the process into the processor) and the process state is set to "running" - where the processor executes its instructions. If a process needs to wait for a resource (such as waiting for user input, or waiting for a file to become available), it is moved into the "blocked" state until it no longer needs to wait - then it is moved back into the "waiting" state. Once the process finishes execution, or is terminated by the operating system, it is moved to the "terminated" state where it waits to be removed from main memory. The act of assigning a processor to the first 12 process on the ready list is called dispatching. The OS may use an interval timer to allow a process to run for a specific time interval or quantum. 2.5 Operations on Process There are various operations that can be performed on a process and are listed below. a) create b) destroy c) suspend d) resume e) change priority f) block g) wake up h) dispatch i) enable 2.5.1 Suspend and Resume The OS could then perform a suspend transition on blocked processes, swapping them on disk and marking their state as suspended (after all, if they must wait for I/O, they might as well do it out of costly RAM), load into main memory a previously suspended process, activating into ready state and go on. However, swapping is an I/O operation in itself, and so at a first sight things might seem to get even worse doing this way. Again the solution is to carefully reconsider the reasons why processes are blocked and swapped, and recognize that if a process is blocked because it waits for I/O, and then suspended. The I/O event might occur while it sits swapped on the disk. 13 Process state transitions with suspend and resume We can thus classify suspended processes into two classes: ready-suspended for those suspended process whose restarting condition has occurred, and blocked-suspended for those who must still wait instead. This classification allows the OS to pick from the good pool of ready-suspended processes, when it wants to revive the queue in main memory. Provisions must be made for passing processes between the new states. This means allowing for new transitions: activate and suspend between ready and ready-suspended, and between blockedsuspended and blocked as well, end event-occurred transitions from blocked to ready, and from blocked-suspended to ready-suspended as well. 2.5.2 Suspending a process • • • • Indefinitely removes it from contention for time on a processor without being destroyed Useful for detecting security threats and for software debugging purposes A suspension may be initiated by the process being suspended or by another process A suspended process must be resumed by another process 2.6 Let us Sum Up In this lesson we have learnt about a) the Process b) the process states c) the process control block d) the process state transitions 14 2.7 Points for Discussion a) Discuss about process control block b) Discuss about the process transition diagram 2.8 Model answers to “Check your Progress” A process is more than the program code, it includes the program counter, the process stack, and the contents of process register etc. The purpose of process stack is to store temporary data, such as subroutine parameters, return address and temporary variables. All these information will be stored in Process Control Block (PCB). The Process control block is a record containing many pieces of information associated with a process including process state, program counter, cpu registers, memory management information, accounting information, I/O status information, cpu scheduling information, memory limits, and list of open files. 2.9 Lesson end Activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. a) Define Process b) What are the various possible states of a process 2.10 References e) Charles Crowley, Chapter 5, 8 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 f) H.M. Deitel, Chapter 3, 4 of “Operating Systems”, Second Edition, Pearson Education, 2001 g) Andrew S. Tanenbaum, Chapter 2 of “Modern Operating Systems”, PHI, 1996 h) D.M. Dhamdhere, Chapter 10 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 15 LESSON – 3: INTERRUPT PROCESSING AND CONTEXT SWITCHING CONTENTS 3.1 Aims and Objectives 3.2 Interrupt Processing 3.2.1 Identifying the High Level Routine 3.2.2 Interrupt Dispatch Table 3.2.3 Rules for Interrupt Processing 3.2.4 Rescheduling while Processing an Interrupt 3.2.5 Interrupt Classes 3.3 Context Switching 3.3.1 Context Switches and Mode Switches 3.3.2 Cost of Context Switching 3.4 Let us Sum Up 3.5 Points for Discussion 3.6 Model answers to “Check your Progress” 3.7 Lesson end Activities 3.8 References 3.1 Aims and Objectives This lesson focuses on the following concepts a) Introduction to interrupt processing b) Interrupt classes c) Context switching The main objective of this lesson is to make the student aware of the interrupt processing, classes and context switching. 3.2 Introduction to Interrupt Processing An interrupt is an event that alters the sequence in which a processor executes instructions and it is generated by the hardware of the computer system. Handling interrupts • After receiving an interrupt, the processor completes execution of the current instruction, then pauses the current process • The processor will then execute one of the kernel’s interrupt-handling functions • The interrupt handler determines how the system should respond • Interrupt handlers are stored in an array of pointers called the interrupt vector 16 • After the interrupt handler completes, the interrupted process is restored and executed or the next process is executed Interrupt handlers should be written in high-level languages so that they are easy to understand and modify. They should be written in assembly language for efficiency reasons and because they manipulate hardware registers and use special call/return sequences that cannot be coded in high-level languages. To satisfy both goals certain OS employs the following two-level strategy. Interrupts branch to low-level interrupt dispatch routines that are written in assembly language. These handle low-level tasks such as saving registers and returning from the interrupt when it has been processed. However, they do little else, they call high-level interrupt routines to do the bulk of interrupt processing, passing them enough information to identify the interrupting device. The OS provides three interrupt dispatchers: one to handle input interrupts, one to handle output interrupts, and one to handle clock interrupts. Input and output dispatchers are separated for convenience, and a special clock dispatcher is provided for efficiency reasons. NT provides an even more modular structure. A single routine, called the trap handler, handles both traps (called exceptions by NT) and interrupts, saving and restoring registers, which are common to both. If the asynchronous event was an interrupt, then it calls an interrupt handler. The task of this routine is to raise the processor priority to that of the device interrupting (so that a lower-level device cannot preempt), call either an internal kernel routine or an external routine called an ISR, and then restore the processor priority. These two routines roughly correspond to our high-level interrupt routine and the trap handler corresponds to our low-level routine. Thus, NT trades off the efficiency of 2 levels for the reusability of 3 levels. The device table entry for an input or output interrupt handler points at the high-level part of the interrupt handler, which is device-specific, and not the low-level part which is shared by all devices (except the clock). 17 Handling interrupts 3.2.1 Identifying the High-Level Routine If all input (output) interrupts branch to the same input (output) dispatch routine, how does the dispatcher know which device-specific interrupt routine to call? The input (output) dispatch routine needs some way to discover the device that interrupted it, so that it can use this information to call the appropriate high-level routine. There are several ways to identify an interrupting device. Here are two of them: The dispatcher may use a special machine instruction to get either the device address or the interrupt vector address of the device. Not all machines have such instructions. The dispatcher may poll devices until it finds one with an interrupt pending. The following `trick' is used, which is common to operating systems, to help identify the interrupting device. The device descriptor (not the device address) is stored in the second word of the interrupt vector. Recall that this word stores the value to be loaded into the PS register when the interrupt routine is called. Certain OS uses the lower order 4 bits, which are used for condition codes, to store the descriptor of the device. These four bits are then used to identify the high-level routine. 18 3.2.2 Interrupt Dispatch Table An interrupt dispatch table is used to relate device descriptors with (high-level) interrupt routines. The table is indexed by a device descriptor and each entry contains the following information: • • • • The address of the input interrupts routine An input code which is passed as an argument to the input interrupts routine The address of the output interrupts routine An output code which is passed as an argument to the above routine The input (output) dispatch routine uses the device descriptor to access the appropriate dispatch table entry and calls the input (output) interrupt routine with the input (output) code as an argument code. The input and output codes can be anything the high-level routines need. In certain OS, they are initially the minor number of the device. Thus only one interrupt routine is needed for all devices of the same type. The minor number is used to distinguish between these devices. 3.2.3 Rules for Interrupt Processing There are several rules for interrupt processing: First, they should ensure that shared data are not manipulated simultaneously by different processes. One way to do so is to make interrupt routines uninterruptible. Thus the PS value stored in the interrupt vector of a device disables interrupts. This value is loaded when the interrupt handler is invoked. As a result, the interrupt routine is uninterruptible while the PS maintains this priority. However, the PS may be changed while the interrupt routine is executing if it calls resched, which may switch to a process that has interrupts enabled. Therefore, the interrupt routine has to ensure that it completes changes to global data structures before it makes any call that results in context switching. An interrupt routine should also make sure that it does not keep interrupts disabled too long. For instance, if the processor does not accept a character from an input device before another arrives, data will be lost. Finally, interrupt routines should never call routines that could block the current process (that is the process executing when the interrupt occurred) in some queue. Otherwise, if the interrupt occurs while the null process is executing, the ready list will be empty. However, resched assumes that there is always some process to execute! Some process should always be runnable to that interrupts can be executed. Thus interrupt routines need to call only those routines that leave the current process in the ready or current state, and may not call routines such as wait. 3.2.4 Rescheduling while Processing an Interrupt We assumed above that interrupt routines could call resched. We now answer the following questions: First, is it useful to do so? Second, is it safe? It is useful to call resched from an interrupt routine. An output routine after removing a character from a buffer may signal a semaphore to allow another process to write data to the buffer space that it makes available. Similarly, an input routine might send data it obtains from the device to a process. In each case, the routine resched is called. 19 It is also safe to call resched. Intuitively it may not seem so, because switching to a process that has interrupts enabled could lead to a sequence of interrupts piling up until the stack overflowed. However, such a danger does not exist for the following reason: A process that is executing an interrupt handler cannot be interrupted again. Some other process, however, can be. Thus a process's stack will hold the PS and PC value for only one interrupt and there will never be more interrupts pending than the number of processes in the system. 3.2.5 Interrupt Classes SVC (supervisor call) interrupts: - They enable software to respond to signals from hardware. These are initiated by a running process that executes the SVC instruction. An SVC is a user generated request for a particular system service such ad performing input/output, obtaining more storage, or communicating with the system operator. A user must request a service through an SVC which helps the OS secure from the user. I/O interrupts: - These are initiated by the input/output hardware. They signal to the CPU that the status of a channel or device has changed. For example, I/O interrupts are caused when an I/O operation completes, when an I/O error occurs, or when a device is made ready. External interrupts: - These are caused by various events including the expiration of a quantum on an interrupting clock, the pressing of the console’s interrupt key by the operator, or the receipt of a signal from another processor on a multiprocessor system. Restart interrupts: - These occur when the operator presses the console’s restart button, or when a restart SIGP (signal processor) instruction arrives from another processor on a multiprocessor system. Program check interrupts: - These occur as a program’s machine language instructions are executed. These problems include divide by zero, arithmetic overflow, data is in wrong format, attempt to execute an invalid operation code, attempt to reference beyond the limits of real memory, attempt by a user process to execute a privileged instruction and attempts to reference a protected resources. Machine check interrupts: - These are caused by malfunctioning hardware. 3.3 Context Switching A context switch (also sometimes referred to as a process switch or a task switch) is the switching of the CPU (central processing unit) from one process or thread to another. A process (also sometimes referred to as a task) is an executing (i.e., running) instance of a program. In Linux, threads are lightweight processes that can run in parallel and share an address space (i.e., a range of memory locations) and other resources with their parent processes (i.e., the processes that created them). A context is the contents of a CPU's registers and program counter at any point in time. A register is a small amount of very fast memory inside of a CPU (as opposed to the slower RAM main memory outside of the CPU) that is used to speed the execution of computer programs by providing quick access to commonly used values, generally those in the midst of a calculation. A program counter is a specialized register that indicates the position of the CPU in its instruction sequence and which holds either the address of the 20 instruction being executed or the address of the next instruction to be executed, depending on the specific system. Context switching can be described in slightly more detail as the kernel (i.e., the core of the operating system) performing the following activities with regard to processes (including threads) on the CPU: (1) suspending the progression of one process and storing the CPU's state (i.e., the context) for that process somewhere in memory, (2) retrieving the context of the next process from memory and restoring it in the CPU's registers and (3) returning to the location indicated by the program counter (i.e., returning to the line of code at which the process was interrupted) in order to resume the process. A context switch is sometimes described as the kernel suspending execution of one process on the CPU and resuming execution of some other process that had previously been suspended. Although this wording can help clarify the concept, it can be confusing in itself because a process is, by definition, an executing instance of a program. Thus the wording suspending progression of a process might be preferable. 3.3.1 Context Switches and Mode Switches Context switches can occur only in kernel mode. Kernel mode is a privileged mode of the CPU in which only the kernel runs and which provides access to all memory locations and all other system resources. Other programs, including applications, initially operate in user mode, but they can run portions of the kernel code via system calls. A system call is a request in a UNIX-like operating system by an active process (i.e., a process currently progressing in the CPU) for a service performed by the kernel, such as input/output (I/O) or process creation (i.e., creation of a new process). I/O can be defined as any movement of information to or from the combination of the CPU and main memory (i.e. RAM), that is, communication between this combination and the computer's users (e.g., via the keyboard or mouse), its storage devices (e.g., disk or tape drives), or other computers. The existence of these two modes in Unix-like operating systems means that a similar, but simpler, operation is necessary when a system call causes the CPU to shift to kernel mode. This is referred to as a mode switch rather than a context switch, because it does not change the current process. Context switching is an essential feature of multitasking operating systems. A multitasking operating system is one in which multiple processes execute on a single CPU seemingly simultaneously and without interfering with each other. This illusion of concurrency is achieved by means of context switches that are occurring in rapid succession (tens or hundreds of times per second). These context switches occur as a result of processes voluntarily relinquishing their time in the CPU or as a result of the scheduler making the switch when a process has used up its CPU time slice. A context switch can also occur as a result of a hardware interrupt, which is a signal from a hardware device (such as a keyboard, mouse, modem or system clock) to the kernel that an event (e.g., a key press, mouse movement or arrival of data from a network connection) has occurred. Intel 80386 and higher CPUs contain hardware support for context switches. However, most modern operating systems perform software context switching, which can be 21 used on any CPU, rather than hardware context switching in an attempt to obtain improved performance. Software context switching was first implemented in Linux for Intel-compatible processors with the 2.4 kernel. One major advantage claimed for software context switching is that, whereas the hardware mechanism saves almost all of the CPU state, software can be more selective and save only that portion that actually needs to be saved and reloaded. However, there is some question as to how important this really is in increasing the efficiency of context switching. Its advocates also claim that software context switching allows for the possibility of improving the switching code, thereby further enhancing efficiency, and that it permits better control over the validity of the data that is being loaded. 3.3.2 The Cost of Context Switching Context switching is generally computationally intensive. That is, it requires considerable processor time, which can be on the order of nanoseconds for each of the tens or hundreds of switches per second. Thus, context switching represents a substantial cost to the system in terms of CPU time and can, in fact, be the most costly operation on an operating system. Consequently, a major focus in the design of operating systems has been to avoid unnecessary context switching to the extent possible. However, this has not been easy to accomplish in practice. In fact, although the cost of context switching has been declining when measured in terms of the absolute amount of CPU time consumed, this appears to be due mainly to increases in CPU clock speeds rather than to improvements in the efficiency of context switching itself. One of the many advantages claimed for Linux as compared with other operating systems, including some other Unix-like systems, is its extremely low cost of context switching and mode switching. Context switches are – Performed by the OS to stop executing a running process and begin executing a previously ready process – Save the execution context of the running process to its PCB – Load the ready process’s execution context from its PCB – Must be transparent to processes – Require the processor to not perform any “useful” computation • OS must therefore minimize context-switching time – Performed in hardware by some architectures 3.4 Let us Sum Up In this lesson we have learnt about a) the Interrupt Processing b) the interrupt classes c) and the context switching 22 3.5 Points for Discussion a) Discuss about context switching b) Discuss about the interrupt classes 3.6 Model answers to “Check your Progress” A context switch (also sometimes referred to as a process switch or a task switch) is the switching of the CPU (central processing unit) from one process or thread to another. A process (also sometimes referred to as a task) is an executing (i.e., running) instance of a program. In Linux, threads are lightweight processes that can run in parallel and share an address space (i.e., a range of memory locations) and other resources with their parent processes (i.e., the processes that created them). 3.7 Lesson - end Activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. a) Discuss about the interrupt b) Discuss about various types of interrupt processing 3.8 References a) Charles Crowley, Chapter 5, 8 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 b) H.M. Deitel, Chapter 3, 4 of “Operating Systems”, Second Edition, Pearson Education, 2001 c) Andrew S. Tanenbaum, Chapter 2 of “Modern Operating Systems”, PHI, 1996 d) D.M. Dhamdhere, Chapter 10 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 23 LESSON – 4: SEMAPHORES CONTENTS 4.1 Aims and Objectives 4.2 Introduction to process synchronization 4.3 Critical section problem 4.4 Semaphores 4.5 Classical problems of synchronization 4.6 Let us Sum Up 4.7 Points for discussion 4.8 Model answers to Check your progress 4.9 Lesson end Activities 4.10 References 4.1 Aims and Objectives The aim of this lesson is to learn the concept of process synchronization, critical section problem, semaphores and classical problems of synchronization The objectives of this lesson are to make the student aware of the following concepts a) process synchronization b) critical section problem c) semaphores d) and classical problems of synchronization 4.2 Introduction to process synchronization Process synchronization will be clear with the following example. Consider the code for consumer and producer as follows Producer while(1) { while(counter = = buffersize); buffer[in]=nextproduced; in = (in+1) % buffersize; counter ++; } Consumer while(1) { while(counter = = 0); 24 nextconsumed = buffer[out]; out = (out+1) % buffersize; counter --; } Both the codes are correct separately but will not function correctly when executed concurrently. This is because the counter++ may be executed in machine language as three separate statements as register1 = counter register1 = register1 + 1 counter = register1 and the counter- - as register2 = counter register2 = register2 - 1 counter = register2 The execution of these statements for the two processes may lead to the following condition for example. a) b) c) d) e) f) producer execute register1 = counter (register1 =5) producer execute register1 = register1 + 1 (register1 =6) consumer execute register2= counter (register2 =5) consumer execute register2 = register2 - 1 (register2 =4) producer execute counter = register1 (counter = 6) consumer execute counter = register2 (counter = 4) You can see that the answer counter =4 is wrong as there are 5 full buffers. A situation like this, where several processes access and manipulate the same data concurrently and the outcome of the execution depends on the particular order in which the access take place, is called a race condition. For this we have to make sure that only one process at a time should manipulate the counter. Such situation occurs frequently in OS and we require some form of synchronization of processes. 4.3 Critical section problem Each process will be having a segment of code called a critical section, in which the process may be changing a common variable, updating a table, writing a file, and so on. do { entry section critical section exit section 25 reminder section }while(1); A solution should satisfy the following three requirements a) Mutual exclusion: If a process is executing in its critical section, then no other processes can be executing in their critical sections b) Progress: If no process is executing in its critical section and some processes wish to enter their critical sections, then only those processes that are not executing in their remainder section can participate in the decision on which will enter its critical section next, and this selection cannot be postponed indefinitely. c) Bounded waiting: There exists a bound on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted. There are many solutions available and can be done using various implementation methods. One of the multiprocess solutions for the critical section is given below Data structure boolean choosing[n]; int number[n]; do { chossing[i]=true; number[i]=max(number[0], number[1], …, number[n-1])+1; choosing[i] = false; for(j=0;j<n;j++) { while(choosing[j]); while((number[j]!=0)&&(number[j, j]<number[i, i])); } CRITICAL SECTION number[i]=0; REMINDER SECTION }while(1); 4.4 Semaphores The solution described in the above section cannot be generalized most of the times. To over come this, we have a synchronization tool called a semaphore proposed by Dijkstra. Semaphores are a pair composed of an integer variable that apart from initialization is accessed only through two standard atomic operations: wait and signal. 26 wait: decrease the counter by one; if it gets negative, block the process and enter its id in the queue. signal: increase the semaphore by one; if it's still negative, unblock the first process of the queue, removing its id from the queue itself. wait(S) { while(S<=0); S - -; } signal(S) { S++; } The atomicity of the above operations is an essential requirement: mutual exclusion while accessing a semaphore must be strictly enforced by the operating system. This is often done by implementing the operations themselves as uninterruptible system calls. It's easy to see how a semaphore can be used to enforce mutual exclusion on a shared resource: a semaphore is assigned to the resource, it's shared among all processes that need to access the resource, and its counter is initialized to 1. A process then waits on the semaphore upon entering the critical section for the resource, and signals on leaving it. The first process will get access. If another process arrives while the former is still in the critical section, it'll be blocked, and so will further processes. In this situation the absolute value of the counter is equal to the number of processes waiting to enter the critical section. Every process leaving the critical section will let a waiting process use the resource by signaling the semaphore. In order to fully understand semaphores, we'll discuss them briefly before engaging any system calls and operational theory. The name semaphore is actually an old railroad term, referring to the crossroad ``arms'' that prevent cars from crossing the tracks at intersections. The same can be said about a simple semaphore set. If the semaphore is on (the arms are up), then a resource is available (cars may cross the tracks). However, if the semaphore is off (the arms are down), then resources are not available (the cars must wait). While this simple example may stand to introduce the concept, it is important to realize that semaphores are actually implemented as sets, rather than as single entities. Of course, a given semaphore set might only have one semaphore, as in our railroad example. Perhaps another approach to the concept of semaphores is to think of them as resource counters. Let's apply this concept to another real world scenario. Consider a print spooler, capable of handling multiple printers, with each printer handling multiple print requests. A hypothetical print spool manager will utilize semaphore sets to monitor access to each printer. Assume that in our corporate print room, we have 5 printers online. Our print spool manager allocates a semaphore set with 5 semaphores in it, one for each printer on the system. Since each printer is only physically capable of printing one job at a time, each of our five 27 semaphores in our set will be initialized to a value of 1 (one), meaning that they are all online, and accepting requests. John sends a print request to the spooler. The print manager looks at the semaphore set, and finds the first semaphore which has a value of one. Before sending John's request to the physical device, the print manager decrements the semaphore for the corresponding printer by a value of negative one (-1). Now, that semaphore's value is zero. In the world of System V semaphores, a value of zero represents 100% resource utilization on that semaphore. In our example, no other request can be sent to that printer until it is no longer equal to zero. When John's print job has completed, the print manager increments the value of the semaphore which corresponds to the printer. Its value is now back up to one (1), which means it is available again. Naturally, if all 5 semaphores had a value of zero, that would indicate that they are all busy printing requests, and that no printers are available. Although this was a simple example, please do not be confused by the initial value of one (1) which was assigned to each semaphore in the set. Semaphores, when thought of as resource counters, may be initialized to any positive integer value, and are not limited to either being zero or one. If it were possible for each of our five printers to handle 10 print jobs at a time, we could initialize each of our semaphores to 10, decrementing by one for every new job, and incrementing by one whenever a print job was finished. Semaphores have a close working relationship with shared memory segments, acting as a watchdog to prevent multiple writes to the same memory segment. 4.5 Classic problems of synchronization There are several problems of synchronization. Some of them are bounded buffer problem, reader writers problem, and dining philosophers problem. In this section we explain only the solution of bounded-buffer problem, namely the producer consumer problem. The solution for producer-consumer problem can be achieved using semaphore as shown in the following code, where mutex, empty and full are semaphores initialized to 1, n 0 respectivily. CODE FOR PRODUCER do { … produce an item in nextp … wait(empty); wait(mutex); … add nextp to buffer … signal(mutex); signal(full); }while(1); CODE FOR CONSUMER 28 do { wait(full); wait(mutex); … remove an item from buffer to nextc … signal(mutex); signal(empty); … consume the item in nextc … }while(1); 4.6 Let us Sum Up In this lesson we have learnt about a) b) c) d) the process synchronization the critical section problem the Semaphores and the classical problems of synchronization 4.7 Points for Discussion After learning this chapter, try to discuss among your friends and answer these questions to check your progress. a) b) c) d) e) What is synchronization Define critical section problem Define Semaphore Discuss about the need of Semaphores Discuss about synchronization based on an example 4.8 Model answers to “Check your Progress” A semaphore, in computer science, is a protected variable (or abstract data type) which constitutes the classic method for restricting access to shared resources, such as shared memory, in a multiprogramming environment. A semaphore is a counter for a set of available resources, rather than a locked/unlocked flag of a single resource. It was invented by Edsger Dijkstra and first used in the THE operating system. The value of the semaphore is initialized to the number of equivalent shared resources being controlled. In the special case where there is a single equivalent shared resource, the semaphore is called a binary semaphore. The general-case semaphore is often called a counting semaphore. Semaphores are the classic solution to the dining philosophers problem, although they do not prevent all resource deadlocks. 4.9 Lesson end Activities 29 Try to implement semaphore in C under Unix 4.10 References a) Charles Crowley, Chapter 8 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 b) H.M. Deitel, Chapter 4, 5 of “Operating Systems”, Second Edition, Pearson Education, 2001 c) Andrew S. Tanenbaum, Chapter 11 of “Modern Operating Systems”, PHI, 1996 d) D.M. Dhamdhere, Chapter 13 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 30 LESSON – 5: DEADLOCK AND INDEFINITE POSTPONEMENT CONTENTS 5.1 Aims and Objectives 5.2 Introduction 5.3 Characteristics to Deadlock 5.4 Deadlock prevention and avoidance 5.5 Deadlock detection and recovery 5.6 Let us Sum Up 5.7 Points for discussion 5.8 Model answers to Check your Progress 5.9 Lesson - end Activities 5.10 References 5.1 Aims and Objectives The aim of this lesson is to learn the concept of Deadlock and indefinite postponement. The objectives of this lesson are to make the student aware of the following concepts a) Deadlock prevention b) Deadlock Avoidance c) Deadlock detection d) Deadlock recovery 5.2 Introduction One problem that arises in multiprogrammed systems is deadlock. A process or thread is in a state of deadlock (or is deadlocked) if the process or thread is waiting for a particular event that will not occur. In a system deadlock, one or more processes are deadlocked. Most deadlocks develop because of the normal contention for dedicated resources (i.e., resources that may be used by only one user at a time). Circular wait is characteristic of deadlocked systems. One example of a system that is prone to deadlock is a spooling system. A common solution is to restrain the input spoolers so that, when the spooling files begin to reach some saturation threshold, they do not read in more print jobs. Today's systems allow printing to begin before the job is completed so that a full, or nearly full, spooling file can be emptied or partially cleared even while a job is still executing. This concept has been applied to streaming audio and video clips, where the audio and video begin to play before the clips are fully downloaded. In any system that keeps processes waiting while it makes resource-allocation and process scheduling decisions, it is possible to delay indefinitely the scheduling of a process while other processes receive the system's attention. This situation, variously called indefinite 31 postponement, indefinite blocking, or starvation, can be as devastating as deadlock. Indefinite postponement may occur because of biases in a system's resource scheduling policies. Some systems prevent indefinite postponement by increasing a process's priority as it waits for a resource—this technique is called aging. Resources can be preemptable (e.g., processors and main memory), meaning that they can be removed from a process without loss of work, or nonpreemptible meaning that they (e.g., tape drives and optical scanners), cannot be removed from the processes to which they are assigned. Data and programs certainly are resources that the operating system must control and allocate. Code that cannot be changed while in use is said to be reentrant. Code that may be changed but is reinitialized each time it is used is said to be serially reusable. Reentrant code may be shared by several processes simultaneously, whereas serially reusable code may be used by only one process at a time. When we call particular resources shared, we must be careful to state whether they may be used by several processes simultaneously or by only one of several processes at a time. The latter kind—serially reusable resources—are the ones that tend to become involved in deadlocks. 5.3 Characteristics of Deadlock The four necessary conditions for deadlock are: a) A resource may be acquired exclusively by only one process at a time (mutual exclusion condition); b) A process that has acquired an exclusive resource may hold it while waiting to obtain other resources (wait-for condition, also called the hold-and-wait condition); c) Once a process has obtained a resource, the system cannot remove the resource from the process's control until the process has finished using the resource (nopreemption condition); d) And two or more processes are locked in a "circular chain" in which each process in the chain is waiting for one or more resources that the next process in the chain is holding (circular-wait condition). Because these are necessary conditions for a deadlock to exist, the existence of a deadlock implies that each of them must be in effect. Taken together, all four conditions are necessary and sufficient for deadlock to exist (i.e., if all these conditions are in place, the system is deadlocked). The four major areas of interest in deadlock research are deadlock prevention, deadlock avoidance, deadlock detection, and deadlock recovery. 32 5.4 Deadlock prevention and avoidance 5.4.1 Deadlock prevention In deadlock prevention our concern is to condition a system to remove any possibility of deadlocks occurring. Havender observed that a deadlock cannot occur if a system denies any of the four necessary conditions. The first necessary condition, namely that processes claim exclusive use of the resources they require, is not one that we want to break, because we specifically want to allow dedicated (i.e., serially reusable) resources. Denying the "wait-for" condition requires that all of the resources a process needs to complete its task be requested at once, which can result in substantial resource underutilization and raises concerns over how to charge for resources. Denying the "no-preemption" condition can be costly, because processes lose work when their resources are preempted. Denying the "circular-wait" condition uses a linear ordering of resources to prevent deadlock. This strategy can increase efficiency over the other strategies, but not without difficulties. 5.4.2 Deadlock avoidance In deadlock avoidance the goal is to impose less stringent conditions than in deadlock prevention in an attempt to get better resource utilization. Avoidance methods allow the possibility of deadlock to loom, but whenever a deadlock is approached, it is carefully sidestepped. Dijkstra's Banker's Algorithm is an example of a deadlock avoidance algorithm. In the Banker's Algorithm, the system ensures that a process's maximum resource need does not exceed the number of available resources. The system is said to be in a safe state if the operating system can guarantee that all current processes can complete their work within a finite time. If not, then the system is said to be in an unsafe state. Dijkstra's Banker's Algorithm requires that resources be allocated to processes only when the allocations result in safe states. It has a number of weaknesses (such as requiring a fixed number of processes and resources) that prevent it from being implemented in real systems. A deadlock avoidance algorithm dynamically examines the resource allocation state to ensure that there can never be a circular wait condition. The resource allocation state is defined by the number of available and allocated resources, and the maximum demands of the processes. A state is safe if the system can allocate resources to each process (up to its maximum) in some order and still avoid a deadlock. 33 5.4.3 Banker’s algorithm Let ‘Available’ be a vector of length m indicating the number of available resources of each type, ‘Max’ be an ‘n x n’ matrix defining the maximum demand of each process, ‘Allocation’ be an ‘n x m’ matrix defining the number of resources of each type currently allocated to each process, and let ‘need’ be an ‘n x m’ matrix indicating the remaining resource need of each process. Let Requesti be the request vector for process pi. If requesti[j]=k, then process pi wants k instances of resource type rj. When a request for resources is made by process p i, the following actions are taken: a) b) c) If Requesti< = Needi then proceed to step b. Else the process has exceeded its maximum claim If Requesti< = Available the proceed to step c. Else the resources are not available and pi must wait The system pretends to have allocated the requested resources to process p i by modifying the state as follows. Available = Available - Requesti Allocationi = Allocationi + Requesti Need i = Need i - Requesti If the resulting resource allocation state is safe, the transaction is completed and process p i is allocated its resources. If the new state is unsafe, the pi must wait for Requesti and the old resource allocation state is restored. 5.4.4 Safety Algorithm The algorithm for finding out whether a system is in a safe state or not can be described as follows. a) b) c) c) Let Work and Finish be vectors of length m and n. Initialize Work = Available and Finish[i] = false Find an i such that a. Finish[i] = false and b. Needi < = Work If no such i exists, go to step d Work = Work + Allocationi Finish[i] = true Go to step b If Finish[i] = true for all i, then the system is in a safe state. 34 5.5 Deadlock detection and recovery Deadlock detection methods are used in systems in which deadlocks can occur. The goal is to determine if a deadlock has occurred, and to identify those processes and resources involved in the deadlock. Deadlock detection algorithms can incur significant runtime overhead. To facilitate the detection of deadlocks, a directed graph indicates resource allocations and requests. Deadlock can be detected using graph reductions. If a process's resource requests may be granted, then we say that a graph may be reduced by that process. If a graph can be reduced by all its processes, then there is no deadlock. If a graph cannot be reduced by all its processes, then the irreducible processes constitute the set of deadlocked processes in the graph. a) b) c) c) Let Work and Finish be vectors of length m and n. Initialize Work = Available. For i = 1, …, n, if Allocationi 0 then Finish[i] = false, else Finish[i] = True Find an i such that a. Finish[i] = false and b. Requesti < = Work If no such i exists, go to step d Work = Work + Allocationi Finish[i] = true Go to step b If Finish[i] = false for some i, then the system is in deadlock state. Deadlock recovery methods are used to clear deadlocks from a system so that it may operate free of the deadlocks, and so that the deadlocked processes may complete their execution and free their resources. Recovery typically requires that one or more of the deadlocked processes be flushed from the system. The suspend/resume mechanism allows the system to put a temporary hold on a process (temporarily preempting its resources), and, when it is safe to do so, resume the held process without loss of work. Checkpoint/rollback facilitates suspend/resume capabilities by limiting the loss of work to the time at which the last checkpoint (i.e., saved state of the system) was taken. When a process in a system terminates (by accident or intentionally as the result of a deadlock recovery algorithm), the system performs a rollback by undoing every operation related to the terminated process that occurred since the last checkpoint. To ensure that data in the database remains in a consistent state when deadlocked processes are terminated, database systems typically perform resource allocations using transactions. In personal computer systems and workstations, deadlock has generally been viewed as a limited annoyance. Some systems implement the basic deadlock prevention methods suggested by Havened, while others ignore the problem—these methods seem to be satisfactory. While ignoring deadlocks may seem dangerous, this approach can actually be rather efficient. If deadlock is rare, then the processor time devoted to checking for deadlocks significantly reduces system performance. However, given current trends, deadlock will continue to be an important area of research as the number of concurrent operations and number of resources becomes large, increasing the likelihood of deadlock in multiprocessor and distributed systems. Also, many real-time systems, which are becoming increasingly prevalent, require deadlock-free resource allocation. 35 5.6 Let us Sum Up In this lesson we have learned about the characteristics of deadlock, deadlock prevention mechanism, deadlock avoidance using bankers and safety algorithms, deadlock detection and recovery. 5.7 Points for discussion After learning this chapter, try to discuss among your friends and answer these questions to check your progress. a) b) c) d) e) What are the characteristics of deadlock? Discuss about the deadlock prevention mechanism Discuss about bankers and safety algorithm How deadlock can be detected. What are the steps needed for deadlock recovery 5.8 Model answers to Check your Progress The characteristic of deadlock can be explained by means of the four necessary conditions for deadlock namely, a) A resource may be acquired exclusively by only one process at a time (mutual exclusion condition); b) A process that has acquired an exclusive resource may hold it while waiting to obtain other resources (wait-for condition, also called the hold-and-wait condition); c) Once a process has obtained a resource, the system cannot remove the resource from the process's control until the process has finished using the resource (no-preemption condition); d) And two or more processes are locked in a "circular chain" in which each process in the chain is waiting for one or more resources that the next process in the chain is holding (circular-wait condition). Because these are necessary conditions for a deadlock to exist, the existence of a deadlock implies that each of them must be in effect. Taken together, all four conditions are necessary and sufficient for deadlock to exist (i.e., if all these conditions are in place, the system is deadlocked). 5.9 Lesson - end Activities Try to write a program in C/C++ to implement bankers and safety algorithms 5.10 References i) Charles Crowley, Chapter 8 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 j) H.M. Deitel, Chapter 6 of “Operating Systems”, Second Edition, Pearson Education, 2001 k) Andrew S. Tanenbaum, Chapter 6 of “Modern Operating Systems”, PHI, 1996 l) D.M. Dhamdhere, Chapter 12 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997. 36 UNIT – II LESSON – 6: STORAGE MANAGEMENT CONTENTS 6.1 Aims and Objectives 6.2 Introduction 6.3 Contiguous storage allocation 6.4 Non-Contiguous Storage Allocation 6.5 Fixed Partitions Multiprogramming 6.6 Variable Partitions Multiprogramming 6.7 Multiprogramming with Storage Swapping 6.8 Let us Sum Up 6.9 Points for discussion 6.10 Model answer to Check your Progress 6.11 Lesson - end Activities 6.12 References 6.1 Aims and Objectives The aim of this lesson is to learn the concept of Real storage management strategies. The objectives of this lesson are to make the student aware of the following concepts a) Contiguous storage allocation b) Non Contiguous storage allocation c) Fixed partition multiprogramming d) Variable partition multiprogramming e) and multiprogramming with storage swapping 6.2 Introduction The organization and management of the main memory or primary memory or real memory of a computer system has been one of the most important factors influencing operating systems design. Regardless of what storage organization scheme we adopt for a particular system, we must decide what strategies to use to obtain optimal performance. Storage Management Strategies are of four types as described below a) Fetch strategies – concerned with when to obtain the next piece of program or data for transfer to main storage from secondary storage a. Demand fetch – in which the next piece of program or data is brought into the main storage when it is referenced by a running program 37 b) c) b. Anticipatory fetch strategies – where we make guesses about the future program control which will yield improved system performance Placement strategies – concerned with determining where in main storage to place and incoming program. Examples are first fit, best fit and worst fit Replacement strategies – concerned with determining which piece of program or data to replace to make place for incoming programs 6.3 Contiguous Storage Allocation In contiguous storage allocation each program has to occupy a single contiguous block of storage locations. The simplest memory management scheme is the bare machine concept, where the user is provided with the complete control over the entire memory space. The next simplest scheme is to divide memory into two sections, one for the user and one for the resident monitor of the operating system. A protection hardware can be provided in terms of fence register to protect the monitor code and data from changes by the user program. The resident monitor memory management scheme may seem of little use since it appears to be inherently single user. When they switched to the next user, the current contents of user memory were written out to a backing storage and the memory of the next user is read in called as swapping 6.4 Non-Contiguous Storage Allocation Memory is divided into a number of regions or partitions. Each region may have one program to be executed. Thus the degree of multiprogramming is bounded by the number of regions. When a region is free, a program is selected from the job queue and loaded into the free regions. Two major schemes are multiple contiguous fixed partition allocation and multiple contiguous variable partition allocation. 6.5 Fixed Partitions Multiprogramming Fixed partitions multiprogramming also called as multiprogramming with fixed number of task (MFT) or multiple contiguous fixed partition allocation. MFT has the following properties. Several users simultaneously compete for system resources switch between I/O jobs and calculation jobs for instance Allowing Relocation and Transfers between partitions Protection implemented by the use of several boundary registers : low and high boundary registers, or base register with length Fragmentation occurs if user programs cannot completely fill a partition - wasteful. All the jobs that enters the system are put into queues. Each partition has its own job queue as shown in the following figure. The memory requirements of each job and the available regions in determining which jobs are allocated memory are taken care by the job scheduler. When a job is allocated space, it is loaded into a region and then compete for the CPU. When a job terminates, it releases its memory region, which the job scheduler may then fill with another job from the job queue. Another way is to allow a single unified queue 38 and the decisions of choosing a job reflect the choice between a best-fit-only or a bestavailable-fit job memory allocation policy. Figure :Multiprogramming - fixed partitions 6.6 Variable Partitions Multiprogramming Variable Partitions Multiprogramming also called as multiprogramming with variable number of task (MVT) or multiple contiguous variable partition allocation. In this scheme, there are no fixed partitions. The memory is divided into regions and allocated to programs as and when it is required. Figure: Multiprogramming - variable partitions MVT has the following properties Variable partitions - allowing jobs to use as much space as they needed (limit being the complete memory space) No need to divide jobs into types - reduce waste if jobs could not fill partition However, complete wastage is still not reduced. The OS keeps a table indicating which parts of memory are available and which are occupied. Initially all memory is available for user programs, and is considered as one large block of available memory, a hole. When a job arrives and needs memory, we search for a hole large enough for this job. If we find one, we allocate only as much as is needed, keeping the rest available to satisfy future requests. The most common algorithms for allocating memory are first-fit and best-fit. 39 Figure: Allocating Memory Once a block of memory has been allocated to a job, its program can be loaded into that space and executed. The minimal hardware support needed is the same as with MFT: two registers containing the upper and lower bounds of the region of memory allocated to this job. When the CPU scheduler selects this process, the dispatcher loads these bounds registers with the correct values. Since every address generated by the cpu is checked against these registers, we can protect other users programs and data from being modified by this running process. 6.7 Multiprogramming with Storage Swapping Multiprogramming with storage swapping has the following features. It is different from the previous schemes where user programs remain in memory until completion. swap job out of main storage (to secondary storage) if it requires service from some external routine (so another process can use the CPU) A job may typically be need to swapping in and out many time before completion Now, main storage is large enough to have many user programs active at the same time (swapping in and out of main memory) Allocate a bit more than necessary for process to grow during execution Area on disk where this happens : swap space (usually the /tmp area in unix) Sometimes, swap space is automatically allocated on process creation (hence, a fixed swap space per process) An average process in the middle of memory (after system has reached equilibrium) will encounter half allocations and half deallocations (above and below) The Fifty percent rule = if mean number of processes in memory is n, the mean number of holes is n/2 Thats because adjacent holes are merged, adjacent processes are not (hence a process / hole asymmetry) - just a heuristic 6.8 Let us sum up 40 In this lesson we have learnt about the concept of Real storage management strategies like a) Contiguous storage allocation, b) Non Contiguous storage allocation, c) Fixed partition multiprogramming, d) Variable partition multiprogramming, e) and multiprogramming with storage swapping 6.9 Points for discussion a) What happens if no job in queue can meet the size of the slot left by the departing job? b) How do we keep track of memory in implementation 6.10 Model answers to Check your Progress The answers for the questions given in 6.8 are a) This leads to the creation of holes in main storage that must be filled. b) We can keep track of memory in implementation by use of i) linked lists, ii) buddy system (dividing memory according to a power of 2) - leads to big wastes (checkerboading or external fragmentation) iii) or bitmaps (use a structure, where a 0 indicates if block is free, and 1 otherwise) 6.11 Lesson - end activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. o o Differentiate between MFT and MVT Advantages of swapping 41 6.12 References m) Charles Crowley, Chapter 10, 11, 12 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 n) H.M. Deitel, Chapter 7, 8, 9 of “Operating Systems”, Second Edition, Pearson Education, 2001 o) Andrew S. Tanenbaum, Chapter 3 of “Modern Operating Systems”, PHI, 1996 p) D.M. Dhamdhere, Chapter 15 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 42 LESSON 7 – VIRTUAL STORAGE CONTENTS 7.1 Aims and Objectives 7.2 Introduction 7.2.1 7.2.2 Overlays Dynamic Loading 7.3 Contiguous storage allocation 7.4 Steps in handling page fault 7.5 Page replacement algorithms 7.5.1 FIFO 7.5.2 Optimal replacement 7.5.3 Least recently used 7.6 Working sets 7.7 Demand paging 7.8 Page size 7.9 Let us Sum Up 7.10 Points for discussion 7.11 Model answers to check your progress 7.12 Lesson end Activities 7.13 References 7.1 Aims and Objectives The aim of this lesson is to learn the concept of virtual storage management strategies. The objectives of this lesson are to make the student aware of the following concepts f) virtual storage management strategies g) page replacement strategies h) working sets i) demand paging j) and page size 7.2 Introduction Virtual Memory is technique which allows the execution of processes that may not be completely in memory. The main advantage of this scheme is that user programs can be larger than physical memory. The ability to execute a program which is only partially in memory would have many benefits which includes (a) users can write programs for a very large virtual address space, (b) more users can be run at the same time, with a corresponding increase in cpu utilization and throughput, but no increase in response time or turnaround time, (c) less I/O would be needed to load or swap each user into memory, so each user would run faster. 43 7.2.1 Overlays Overlay is a technique which keeps in memory only those instructions and data that are currently needed and when other instructions are needed they are loaded into space that was previously occupied by instructions that are no longer needed. 7.2.2 Dynamic Loading Here a routine is not called until it is called. The advantage of dynamic loading is that an unused routine is never loaded. This scheme is particularly useful when large amounts of code are needed to handle infrequently. 7.3 Virtual storage management strategies There are three main strategies namely Fetch strategies – concerned with when a page or segment should be brought from secondary to primary storage Placement strategies – concerned with where in primary storage to place an incoming page or segment Replacement strategies – concerned with deciding which page or segment to displace to make room for an incoming page or segment when primary storage is already fully committed 7.4 Steps in handling page fault When a page is not available in the main memory a page fault occurs. When a page fault occurs, the OS has to do some necessary steps to bring the required page from secondary storage device to main memory. The steps in handling page fault are as follows a) First check whether the reference is valid or not from the internal table of process control block (PCB) b) Bring the page if it is not already loaded and the reference is valid c) Find a free frame d) Read the desired page into the newly allocated frame e) Then modify the internal table in the PCB to indicate that the page is now available f) Restart the instruction that was interrupted. 7.5 Page replacement algorithms There are many page replacement algorithms and the most important three are FIFO, optimal replacement and least recently used. This subsection explains the above three algorithms. 44 7.5.1 FIFO The simplest page replacement algorithm is first in first out. In this scheme, when a page must be replaced, the oldest page is chosen. For example consider the page reference string 1, 5, 6, 1, 7, 1, 5, 7, 6, 1, 5, 1, 7 For a three frame case, the FIFO will work as follows. Let all our 3 frames are initially empty. 1 1 5 1 5 6 7 5 6 7 1 6 7 1 5 6 1 5 6 7 5 You can see, FIFO creates eight page faults. 7.5.2 Optimal replacement In optimal page replacement algorithm, we replace that page which will not be used for the longest period of time. For example for the reference string 1, 5, 6, 1, 7, 1, 5, 7, 6, 1, 5, 1, 7 with 3 frames, the page faults will be as follows 1 1 5 1 5 6 1 5 7 1 5 6 1 5 7 You can see that Optimal replacement, creates six page faults 7.5.3 Least recently used Most of the case, predicting the future page references is difficult and hence implementing optimal replacement is difficult. Hence there is a need of other scheme which approximates the optimal replacement. Least recently used (LRU) schemes approximate the future uses by the past used pages. In LRU scheme, we replace those pages which have not been used for the longest period of time. For example for the reference string 1, 5, 6, 1, 7, 1, 5, 7, 6, 1, 5, 1, 7 with 3 frames, the page faults will be as follows 1 1 5 1 5 6 1 7 6 1 7 5 6 7 5 6 7 1 6 5 1 7 5 1 You can see that LRU creates nine page faults 7.6 Working sets If the number of frames allocated to a low-priority process falls below the minimum number required, we must suspend its execution. We should then page out it remaining pages, freeing all of its allocated frames. A process is thrashing if it is spending more time paging than executing. 45 Thrashing can cause severe performance problems. To prevent thrashing, we must provide a process with as many frames as it needs. There are several techniques available to know how many frame a process needs. Working sets is a strategy which starts by looking at what a program is actually using. The set of most recent page references is the working set denoted by . The accuracy of the working set depends upon the selection of . If it is too small, page fault will increase an if it is too large, then it is very difficult to allocate the required frames. For example, 12 3141525253 62131 46145 = 11 ws={2,3,1,4,5} 11312113 121 3456 66 3 = 11 ws={1,3,2} You can see that the working set (ws) at two different times for the window size .=11. [The working set refers to the pages the process has used during that time for the window size ]. So at the maximum the above given example needed atleast 5 frames, otherwise page fault will occur. In most of the cases we will allocate the number of frames to a process depending on the average working set size. Let Si be the average working set size for each process in the system. Then D Si is the total demand for frames. Thus process i needs Si frames. If the total demand is greater than the total number of available frames, thrashing will occur. 7.7 Demand paging Demand paging is the most common virtual memory system. Demand paging is similar to a paging system with swapping. When we need a program, it is swapped from the backing storage. There are also lazy swappers, which never swaps a page into memory unless it is needed. The lazy swapper decreases the swap time and the amount of physical memory needed, allowing an increased degree of multiprogramming. 7.8 Page size There is no single best page size. The designers of the Operating system will decide the page size for an existing machine. Page sizes are usually be in powers of two, ranging from 28 to 212 bytes or words. The size of the pages will affect in the following way. a) b) c) d) Decreasing the page size increases the number of pages and hence the size of the page table. Memory is utilized better with smaller pages. For reducing the I/O time we need to have smaller page size. To minimize the number of page faults, we need to have a large page size. 7.9 Let us Sum Up In this lesson we have learnt about the concept of virtual storage management strategies like page replacement strategies, working sets, demand paging, and page size 46 7.10 Points for discussion After learning this lesson, try to discuss among your friends and answer these questions to check your progress. a) What is the use of demand paging b) Differentiate between optimal replacement and LRU 7.11 Model answers to check your progress Demand paging is the most common virtual memory system. Demand paging is similar to a paging system with swapping. When we need a program, it is swapped from the backing storage. There are also lazy swappers, which never swaps a page into memory unless it is needed. The lazy swapper decreases the swap time and the amount of physical memory needed, allowing an increased degree of multiprogramming. 7.12 Lesson end Activities For the reference string 1, 5, 6, 1, 7, 1, 5, 7, 6, 1, 5, 1, 7 find the page fault rate for various page replacement algorithms with number of frame as 5. 7.13 References Charles Crowley, Chapter 11, 12 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 H.M. Deitel, Chapter 7, 8, 9 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 3 of “Modern Operating Systems”, PHI, 1996 D.M. Dhamdhere, Chapter 15 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 47 UNIT – III LESSON – 8: PROCESSOR MANAGEMENT CONTENTS 8.1 Aims and Objectives 8.2 Introduction 8.3 Preemptive Vs Non-Preemptive scheduling 8.4 Priorities 8.5 Deadline scheduling 8.6 Let us Sum Up 8.7 Points for discussion 8.8 Model answer to Check your Progress 8.9 Lesson end Activities 8.10 References 8.1 Aims and Objectives A multiprogramming operating system allows more than one process to be loaded into the executabel memory at a time and for the loaded process to share the CPU using timemultiplexing.Part of the reason for using multiprogramming is that the operating system itself is implemented as one or more processes, so there must be a way for the operating system and application processes to share the CPU. Another main reason is the need for processes to perf I/O operations in the normal course of computation. Since I/O operations ordinarily require orders of magnitude more time to complete than do CPU instructions, multiprograming systems allocate the CPU to another process whenever a process invokes an I/O operation Make sure your scheduling strategy is good enough with the following criteria: Utilization/Efficiency: keep the CPU busy 100% of the time with useful work Throughput: maximize the number of jobs processed per hour. Turnaround time: from the time of submission to the time of completion, minimize the time batch users must wait for output Waiting time: Sum of times spent in ready queue - Minimize this Response Time: time from submission till the first response is produced, minimize response time for interactive users Fairness: make sure each process gets a fair share of the CPU The aim of this lesson is to learn the concept of processor management and related issues. The objectives of this lesson are to make the student aware of the following concepts k) preemptive scheduling 48 l) Non preemptive scheduling m) Priorities n) and deadline scheduling 8.2 Introduction When one or more process is runnable, the operating system must decide which one to run first. The part of the operating system that makes decision is called the Scheduler; the algorithm it uses is called the Scheduling Algorithm. An operating system has three main CPU schedulers namely the long term scheduler, short term scheduler and medium term schedulers. The long term scheduler determines which jobs are admitted to the system for processing. It selects jobs from the job pool and loads them into memory for execution. The short term scheduler selects from among the jobs in memory which are ready to execute and allocated the cpu to one of them. The medium term scheduler helps to remove processes from main memory and from the active contention for the cpu and thus reduce the degree of multiprogramming. The cpu scheduler has another component called as dispatcher. It is the module that actually gives control of the cpu to the process selected by the short term scheduler which involves loading of registers of the process, switching to user mode and jumping to the proper location. Before looking at specific scheduling algorithms, we should think about what the scheduler is trying to achieve. After all the scheduler is concerned with deciding on policy, not providing a mechanism. Various criteria come to mind as to what constitutes a good scheduling algorithm. Some of the possibilities include: 1. Fairness – make sure each process gets its fair share of the CPU. 2. Efficiency (CPU utilization) – keep the CPU busy 100 percent of the time. 3. Response Time [Time from the submission of a request until the first response is produced] – minimize response time for interactive users. 4. Turnaround time [The interval from the time of submission to the time of completion] – minimize the time batch users must wait for output. 5. Throughput [Number of jobs that are completed per unit time] – maximize the number of jobs processed per hour. 6. Waiting time – minimize the waiting time of jobs 8.3 Preemptive Vs Non-Preemptive The Strategy of allowing processes that are logically runnable to be temporarily suspended is called Preemptive Scheduling. ie., a scheduling discipline is preemptive if the CPU can be taken away. Preemptive algorithms are driven by the notion of prioritized computation. The process with the highest priority should always be the one currently using 49 the processor. If a process is currently using the processor and a new process with a higher priority enters, the ready list, the process on the processor should be removed and returned to the ready list until it is once again the highest-priority process in the system. Run to completion is also called Nonpreemptive Scheduling. ie., a scheduling discipline is nonpreemptive if, once a process has been given the CPU, the CPU cannot be taken away from that process. In short, Non-preemptive algorithms are designed so that once a process enters the running state(is allowed a process), it is not removed from the processor until it has completed its service time ( or it explicitly yields the processor). This leads to race condition and necessitates of semaphores, monitors, messages or some other sophisticated method for preventing them. On the other hand, a policy of letting a process run as long as it is wanted would mean that some process computing π to a billion places could deny service to all other processes indefinitely. 8.4 Priorities A priority is associated with each job, and the cpu is allocated to the job with the highest priority. Priorities are generally some fixed numbers such as 0 to 7 or 0 to 4095. However there is no general agreement on whether 0 is the highest or lowest priority. Priority can be defined either internally or externally. Examples of internal priorities are time limits, memory requirements, number of open files, average I/O burst time, CPU burst time, etc. External priorities are given by the user. A major problem with priority scheduling algorithms is indefinite blocking or starvation. A solution to this problem is aging. Aging is a technique of gradually increasing the priority of jobs that wait in the system for a long time. 8.5 Deadline scheduling Certain jobs have to be completed in specified time and hence to be scheduled based on deadline. If delivered in time, the jobs will be having high value and otherwise the jobs will be having nil value. The deadline scheduling is complex for the following reasons a) Giving resource requirements of the job in advance is difficult b) A deadline job should be run without degrading other deadline jobs c) In the event of arriving new jobs, it is very difficult to carefully plan resource requirements d) Resource management for deadline scheduling is really an overhead 50 8.6 Let us Sum Up In this lesson we have learnt about a) preemptive scheduling b) Nonpreemptive scheduling c) Priorities d) and deadline scheduling 8.7 Points for discussion a) Why CPU scheduling is important? b) How to evaluate scheduling algorithm? 8.8 Model answer to Check your Progress The answers for the question in 8.8 are discussed below. a) Because it can can have a big effect on resource utilization and the overall performance of the system. b) There are many possible criteria: a. CPU Utilization: Keep CPU utilization as high as possible. (What is utilization, by the way?). b. Throughput: number of processes completed per unit time. c. Turnaround Time: mean time from submission to completion of process. d. Waiting Time: Amount of time spent ready to run but not running. e. Response Time: Time between submission of requests and first response to the request. f. Scheduler Efficiency: The scheduler doesn't perform any useful work, so any time it takes is pure overhead. So, need to make the scheduler very efficient. 8.9 Lesson end Activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. a) What is CPU scheduling? b) Discuss about deadline scheduling. How to evaluate scheduling algorithm? 51 8.10 References Charles Crowley, Chapter 8 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 H.M. Deitel, Chapter 10 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 11 of “Modern Operating Systems”, PHI, 1996 D.M. Dhamdhere, Chapter 9 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 52 LESSON – 9: PROCESSOR SCHEDULING CONTENTS 9.1 Aims and Objectives 9.2 Introduction 9.3 First In First out (FIFO) 9.4 Round Robin Scheduling 9.5 Quantum size 9.6 Shortest Job First (SJF) 9.7 Shortest remaining time first (SRF) 9.8 Highest response ratio next (HRN) 9.9 Let us Sum Up 9.10 Points for discussion 9.11 Model answers to Check your Progress 9.12 Lesson - end Activities 9.13 References 9.1 Aims and Objectives The aim of this lesson is to learn the concept of processor scheduling and scheduling algorithms. The objectives of this lesson are to make the student aware of the following concepts o) FIFO p) Round Robin q) Shortest Job first r) Shortest remaining time first s) Highest response ratio next (HRN) 9.2 Introduction Different algorithms have different properties and may favor one class of processes over another. In choosing best algorithm, the characteristic explained in the previous lesson must be considered which includes CPU utilization, throughput, turnaround time, waiting time, and response time. The five most important algorithms used in the CPU scheduling are FIFO, Round Robin, Shortest Job first, Shortest remaining time first and Highest response ratio next (HRN). The following sections describe each one of it. 9.3 First In First out (FIFO) 53 A low overhead paging algorithm is the FIFO (First in, First Out) algorithm. To illustrate how this works, consider a supermarket that has enough shelves to display exactly k different products. One day, some company introduces a new convenience food-instant, freeze-dried, organic yogurt that can be reconstituted in a microwave oven. It is an immediate success, so our finite supermarket has to get rid of one old product in order to stock it. One possibility is to find the product that the supermarket has been stocking the longest and get rid of it on the grounds that no one is interested anymore. In effect, the supermarket maintains the linked list of all the products it currently sells in the order they were introduced. The new one goes on the back of the list; the one at the front of the list is dropped. As a page replacement algorithm, the same idea is applicable. The operating system maintains a list of all pages currently in memory, with the page at the head of the list the oldest one and the page at the tail the most recent arrival. On a page fault, the page at the head is removed and the new page added to the tail of the list. When applied to stores, FIFO might remove mustache wax, but it might also remove flour, salt or butter. When applied to computers the same problems arise. For this reason, FIFO in its pure form is rarely used. Consider for example, the following scenario of four jobs and the corresponding CPU burst time arrived in the order of job number. Job Burst time 1 20 2 10 3 5 4 15 FCFS algorithm allocates the job to the cpu in the order of there arrival and the following Gantt chart shows the result of execution. Job 1 Job 2 0 20 30 Job 3 35 Job4 50 The waiting times of jobs are Job 1 0 Job 2 20 Job 3 30 Job 4 35 -------------------------------------------Total waiting time = 85 Hence the average waiting time is 21.25. The turnaround times of jobs are Job 1 Job 2 20 30 54 Job 3 35 Job 4 50 -------------------------------------------Total turnaround time = 135 Hence the average turnaround time is 33.25. 9.4 Round Robin Scheduling One of the oldest, simplest, fairest and most widely used algorithms is Round Robin. Each process is assigned a time interval, called the Quantum, which it is allowed to run. If the process is still running at the end of the quantum, the CPU is preempted and given to another process. If the process has blocked or finished before the quantum has elapsed, the CPU switching is done. Round robin is easy to implement. All scheduler has to maintain a list of run able processes. B F D G A F D G A B The only interesting issue with the round robin is the length of the quantum. Switching from one process to another requires a certain amount of time for doing the administration – saving and loading registers and memory maps, updating various tables and lists, etc. suppose that this process switch or context switch takes 5 msecs. Also suppose the quantum is set say 20 msecs. With these parameters, after doing 20 msecs of useful work, the CPU will have to spend 5 msecs on process switching. Twenty percent of the CPU time will be wasted on administrative overhead. Consider for example, the following scenario of four jobs and the corresponding CPU burst time arrived in the order of job number. Job Burst time 1 20 2 10 3 5 4 15 RR algorithm allocates a quantum of time to each job in a rotation and the following Gantt chart shows the result of execution. Let the time quantum be 5 Job 1 0 50 Job 2 5 10 Job 3 Job 4 15 20 Job 1 25 Job 2 30 Job 4 35 40 Job 1 45 Job 4 Job 1 50 The waiting times of jobs are Job 1 0 + 15 + 10 + 5 = 30 Job 2 5 + 15 = 20 Job 3 10 = 10 Job 4 15 + 10 + 5 = 30 -------------------------------------------------------------- 55 Total waiting time = 90 Hence the average waiting time is 22.5 The turnaround times of jobs are Job 1 50 Job 2 30 Job 3 15 Job 4 45 -------------------------------------------Total turnaround time = 140 Hence the average turnaround time is 35 9.5 Quantum size In round robin scheduling algorithm, no process is allocated the cpu for more than one time quantum in a row. If its cpu burst exceeds a time quantum, it is preempted and put back in the ready queue. If the time quantum is too large, round robin becomes equivalent to FCFS. If the time quantum is too small in terms of microseconds, round robin is called as processor sharing and appears as if each of n processes has its own processors running at 1/n the speed of the real processor. The time quantum size must be large with respect to the context switch time. If the context switch time is approximately 5 percent of the time quantum, then cpu will spent 5 percent of the time for context switching. 9.6 Shortest Job First (SJF) Most of the above algorithms were designed for interactive systems. Now let us look at one that is especially appropriate for batch jobs where the run times are known in advance. In an insurance company, for example, people can predict quite accurately how long it will take to run a batch of 1000 claims, since similar work is done every day. When several equally important jobs are sitting in the input queue waiting to be started, the scheduler should use shortest job first. 8 A 4 B 6 C 5 D Here we find four jobs A, B, C, and D with run times of 8, 4, 6 and 3 minutes respectively. By running them in that order, the turn around time for A is 8 minutes, for B is 12 Minutes, for C is 18 Minutes and for D is 23 minutes for an average of 15.25 minutes. Now if we do the SJF first as follows 56 B D C A The turnaround times are now 4, 9, 15 and 23 minutes for an average of 12.75 minutes. Shortest job first is provably optimal. Consider the case of four jobs, with runtimes of a, b, c and d respectively. The first job finishes at time a, the second finishes at time a+b and so on. The mean turnaround time is (4a+3b+2c+d)/4. It is clear that ‘a’ contributes more to the average than the other, so it should be the shortest job, with b next, then c and finally d as the longest, as it affects only its own turnaround time. The same argument applies equally well to any number of jobs. Consider for example, the following scenario of four jobs and the corresponding CPU burst time arrived in the order of job number. Job 1 2 3 4 Burst time 20 10 5 15 The algorithm allocates the shortest job first. Job 3 0 50 Job 2 5 Job4 15 Job 1 30 50 The waiting times of jobs are Job 1 30 Job 2 5 Job 3 0 Job 4 15 -------------------------------------Total waiting time = 50 Hence the average waiting time is 12.5 The turnaround times of jobs are Job 1 50 Job 2 15 Job 3 5 Job 4 30 -------------------------------------------Total turnaround time = 100 Hence the average turnaround time is 25 57 9.7 Shortest remaining time first (SRF) Shortest job first may be either preemptive or non-preemptive. When a new job arrives, at the ready queue with a shortest cpu burst time, while a previous job is executing, then a preemptive shortest job first algorithm will preempt the currently executing job, while a non-preemptive shortest-job-first algorithm will allow the currently running job to finish its cpu burst. Preemptive-shortest-job-first algorithm is also called as shortest remaining time first. Consider for example, the following scenario of four jobs and the corresponding CPU burst time and arrival time. Job 1 2 3 4 Burst time 20 10 5 15 Arrival time 0 2 4 19 The algorithm allocates the jobs as shown in Gantt chart Job 1 0 Job 2 2 4 Job 3 9 Job 2 17 Job 1 19 Job 4 34 Job 1 50 The waiting times of jobs are Job 1 0 + 15 + 15 Job 2 2+5 Job 3 4 Job 4 19 -------------------------------------Total waiting time = 30 =7 =4 = 19 = 60 Hence the average waiting time is 15 The turnaround times of jobs are Job 1 50 Job 2 17 Job 3 9 Job 4 34 -------------------------------------------Total turnaround time = 110 Hence the average turnaround time is 27.5 9.8 Highest response ratio next (HRN) 58 HRN is a nonpreemptive scheduling algorithm which considers both the CPU burst time and waiting time. The priority of the job in HRN can be calculated as priority = time waiting + service time service time where service time is the next cpu burst time. Here shortest jobs will get highest priority since it appears in the denominator. Since waiting time appears in the numerator, longer waiting jobs will also get priority. 9.9 Let us Sum Up In this chapter we have learned about various scheduling algorithms like, FIFO, RR, SJF, SRT and HRN. 9.10 Points for discussion a) Consider performance of FCFS algorithm for three compute-bound processes. What if have 4 processes P1 (takes 24 seconds), P2 (takes 3 seconds) and P3 (takes 3 seconds). If arrive in order P1, P2, P3, what is Waiting Time?, Turnaround Time? and Throughput? b) c) d) e) What about if processes come in order P2, P3, P1? What happens with really a really small quantum? What about having a really small quantum supported in hardware? What about a really big quantum? 9.11 Model answers to Check your progress The answers for the question given in 9.11 are a) b) c) d) e) Waiting Time? (24 + 27) / 3 = 17, Turnaround Time? (24 + 27 + 30) = 27, Throughput? 30 / 3 = 10. Waiting Time? (3 + 3) / 2 = 6, Turnaround Time? (3 + 6 + 30) = 13, Throughput? 30 / 3 = 10. It looks like you've got a CPU that is 1/n as powerful as the real CPU, where n is the number of processes. Problem with a small quantum - context switch overhead. Then, you have something called multithreading. Give the CPU a bunch of registers and heavily pipeline the execution. Feed the processes into the pipe one by one. Treat memory access like IO - suspend the thread until the data comes back from the memory. In the meantime, execute other threads. Use computation to hide the latency of accessing memory. It turns into FCFS. Rule of thumb - want 80 percent of CPU bursts to be shorter than time quantum 9.12 Lesson end Activities Try to write C/C++ programs to implement FIFO, RR, SJF, SRT and HRN 59 9.13 References Charles Crowley, Chapter 8 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 H.M. Deitel, Chapter 10 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 11 of “Modern Operating Systems”, PHI, 1996 D.M. Dhamdhere, Chapter 9 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. 60 LESSON – 10: DISTRIBUTED COMPUTING CONTENTS 10.1 Aims and Objectives 10.2 Introduction 10.3 Classification of sequential and parallel processing 10.4 Array Processors 10.4.1 History 10.5 Dataflow computer 10.6 Let us Sum Up 10.7 Points for discussion 10.8 Model Answer to Check your Progress 10.9 Lesson - end Activities 10.10 References 10.1 Aims and Objectives The aim of this lesson is to learn the concept of distribute computing and parallel computing. The objectives of this lesson are to make the student aware of the following concepts t) sequential processing u) parallel processing v) Array Processors w) and Dataflow computer 10.2 Introduction Parallel computing is the simultaneous execution of some combination of multiple instances of programmed instructions and data on multiple processors in order to obtain results faster. The idea is based on the fact that the process of solving a problem usually can be divided into smaller tasks, which may be carried out simultaneously with some coordination. The technique was first put to practical use by ILLIAC IV in 1976, fully a decade after it was conceived. A parallel computing system is a computer with more than one processor for parallel processing. In the past, each processor of a multiprocessing system always came in its own processor packaging, but recently-introduced multicore processors contain multiple logical processors in a single package. There are many different kinds of parallel computers. They are distinguished by the kind of interconnection between processors (known as "processing elements" or PEs) and memory. Parallel computers can be modelled as Parallel Random Access Machines (PRAMs). The PRAM model ignores the cost of interconnection between the constituent computing 61 units, but is nevertheless very useful in providing upper bounds on the parallel solvability of many problems. In reality the interconnection plays a significant role. The processors may communicate and cooperate in solving a problem or they may run independently, often under the control of another processor which distributes work to and collects results from them (a "processor farm"). Processors in a parallel computer may communicate with each other in a number of ways, including shared (either multiported or multiplexed, parallel supercomputers, NUMA vs. SMP vs. massively parallel computer systems, distributed computing (esp. computer clusters and grid computing). According to Amdahl's law, parallel processing is less efficient than one x-times-faster processor from a computational perspective. However, since power consumption is a super-linear function of the clock frequency on modern processors, we are reaching the point where from an energy cost perspective it can be cheaper to run many low speed processors in parallel than a single highly clocked processor. 10.3 Classification of sequential and parallel processing Flynn's taxonomy, one of the most accepted taxonomies of parallel architectures, classifies parallel (and serial) computers according to: whether all processors execute the same instructions at the same time (single instruction/multiple data -- SIMD) or whether each processor executes different instructions (multiple instruction/multiple data -- MIMD). SISD (single instruction stream, single data stream) machines are uni-processor computers that process one instruction at a time and are the most common architecture available today. Array processors essentially performs operations simultaneously on every element of an array namely SIMD (Single instruction stream, multiple data stream). Another category is multiple instruction stream and single data stream (MISD) which is not in use. Parallel processors or multiprocessors can handle multiple instruction stream and multiple data streams. Another major way to classify parallel computers is based on their memory architectures. Shared memory parallel computers have multiple processors accessing all available memory as global address space. They can be further divided into two main classes based on memory access times: Uniform Memory Access (UMA), in which access times to all parts of memory are equal, or Non-Uniform Memory Access (NUMA), in which they are not. Distributed memory parallel computers also have multiple processors, but each of the processors can only access its own local memory; no global memory address space exists across them. Parallel computing systems can also be categorized by the numbers of processors in them. Systems with thousands of such processors are known as massively parallel. Subsequently there are what are referred to as "large scale" vs. "small scale" parallel processors. This depends on the size of the processor, eg. a PC based parallel system would generally be considered a small scale system. Parallel processor machines are also divided into symmetric and asymmetric multiprocessors, depending on whether all the processors are the same or not (for instance if only one is capable of running the operating system code and others are less privileged). A variety of architectures have been developed for parallel processing. For example a Ring architecture has processors linked by a ring structure. Other architectures include Hypercubes, Fat trees, systolic arrays, and so on. 62 10.4 Array Processors A vector processor, or array processor, is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. This is in contrast to a scalar processor which handles one element at a time. The vast majority of CPUs are scalar (or close to it). Vector processors were common in the scientific computing area, where they formed the basis of most supercomputers through the 1980s and into the 1990s, but general increases in performance and processor design saw the near disappearance of the vector processor as a general-purpose CPU. Today most commodity CPU designs include some vector processing instructions, typically known as SIMD (Single Instruction, Multiple Data), common examples include SSE and AltiVec. Modern video game consoles and consumer computer-graphics hardware rely heavily on vector processing in their architecture. In 2000, IBM, Toshiba and Sony collaborated to create a Cell processor, consisting of one scalar processor and eight vector processors, for the Sony PlayStation 3. 10.4.1 HISTORY Vector processing was first worked on in the early 1960s at Westinghouse in their Solomon project. Solomon's goal was to dramatically increase math performance by using a large number of simple math co-processors (or ALUs) under the control of a single master CPU. The CPU fed a single common instruction to all of the ALUs, one per "cycle", but with a different data point for each one to work on. This allowed the Solomon machine to apply a single algorithm to a large data set, fed in the form of an array. In 1962 Westinghouse cancelled the project, but the effort was re-started at the University of Illinois as the ILLIAC IV. Their version of the design originally called for a 1 GFLOPS machine with 256 ALUs, but when it was finally delivered in 1972 it had only 64 ALUs and could reach only 100 to 150 MFLOPS. Nevertheless it showed that the basic concept was sound, and when used on data-intensive applications, such as computational fluid dynamics, the "failed" ILLIAC was the fastest machine in the world. It should be noted that the ILLIAC approach of using separate ALUs for each data element is not common to later designs, and is often referred to under a separate category, massively parallel computing. The first successful implementation of vector processing appears to be the CDC STAR-100 and the Texas Instruments Advanced Scientific Computer (ASC). The basic ASC (i.e., "one pipe") ALU used a pipeline architecture which supported both scalar and vector computations, with peak performance reaching approximately 20 MFLOPS, readily achieved when processing long vectors. Expanded ALU configurations supported "two pipes" or "four pipes" with a corresponding 2X or 4X performance gain. Memory bandwidth was sufficient to support these expanded modes. The STAR was otherwise slower than CDC's own supercomputers like the CDC 7600, but at data related tasks they could keep up while being much smaller and less expensive. However the machine also took considerable time decoding the vector instructions and getting ready to run the process, so it required very specific data sets to work on before it actually sped anything up. The vector technique was first fully exploited in the famous Cray-1. Instead of leaving the data in memory like the STAR and ASC, the Cray design had eight "vector registers" which held sixty-four 64-bit words each. The vector instructions were applied 63 between registers, which is much faster than talking to main memory. In addition the design had completely separate pipelines for different instructions, for example, addition/subtraction was implemented in different hardware than multiplication. This allowed a batch of vector instructions themselves to be pipelined, a technique they called vector chaining. The Cray-1 normally had a performance of about 80 MFLOPS, but with up to three chains running it could peak at 240 MFLOPS – a respectable number even today. Other examples followed. CDC tried to re-enter the high-end market again with its ETA-10 machine, but it sold poorly and they took that as an opportunity to leave the supercomputing field entirely. Various Japanese companies (Fujitsu, Hitachi and NEC) introduced register-based vector machines similar to the Cray-1, typically being slightly faster and much smaller. Oregon-based Floating Point Systems (FPS) built add-on array processors for minicomputers, later building their own minisupercomputers. However Cray continued to be the performance leader, continually beating the competition with a series of machines that led to the Cray-2, Cray X-MP and Cray Y-MP. Since then the supercomputer market has focused much more on massively parallel processing rather than better implementations of vector processors. However, recognizing the benefits of vector processing IBM developed Virtual Vector Architecture for use in supercomputers coupling several scalar processors to act as a vector processor. Today the average computer at home crunches as much data watching a short QuickTime video as did all of the supercomputers in the 1970s. Vector processor elements have since been added to almost all modern CPU designs, although they are typically referred to as SIMD. In these implementations the vector processor runs beside the main scalar CPU, and is fed data from programs that know it is there. 10.5 Dataflow computer Dataflow computing is a subject of considerable interest, since this class of computer architectures exposes high level of parallelism. A dataflow computer system VSPD-1 presented here is a research prototype of a static dataflow architecture according to the Veen’s classification1. Static partitioning, linking, and distribution of actors among processing modules must be done at the stage of development of a dataflow application for VSPD-1. VSPD-1 is designed to scale up to 16 processing nodes. The research prototype of VSPD-1 implemented at the Computer Engineering Department, LETI (Leningrad Institute of Electrical Engineering), consists of five processing modules (PM). Each VSPD-1 processing module consists of - a microcomputer MS11200.5 with a 1MIPS microprocessor and 32-KB RAM; - PROM for a loader and a kernel of a dataflow run-time system (called here monitor); - communication Qbus-Qbus adapters (CA); - a I/O processor (IOP), that supports up to eight DMA channels for internode communication - an auxiliary control unit that implements most of operations on dataflow control2. 64 Restrictions of physical realization motivated the following configuration of the 5modules system: four processing modules are configured in a ring, and the fifth module is connected with all other PMs. The fifth PM uses a communication adapter that connects the processing module to a host computer DVK-4. The structure of the VSPD processing module is presented in Fig.1. Fig.1. Dataflow computation is realized at the level of labeled code fragments called actors. An actor is ready and can be performed, when it receives all data (operands) required for its execution. An application for VSPD-1 must be written in a specific dataflow style where actors are programmed in Pascal with macros in Macro-11 assembler. Each labeled code segment that specifies an actor ends with an assembler macro which passes a pointer to a destination list to the kernel and invokes it. The kernel sends data produced by the actor to destination actors specified in the list. A label of the actor serves as its starting address (activation point). Addresses of destination actors that will receive the result tokens are stored in a destination table that actually holds a dataflow graph of the application. A dataflow program includes also a table of starting addresses of actors, and a table of ready bit vectors. A ready-bit vector indicates for each of actor inputs whether a token has arrived at that input. Before computation starts, the ready bit table contains initial mapping of tokens to actor inputs. A special mask stored together with a ready-bit vector indicates the inputs which are always ready (constant or not in use). The system supports explicit and implicit transferring data from source-actors to destination actors. With explicit transfer, data that are sent to a destination actor, are followed by a result token, whose role is to indicate that data has been sent to the destination. A token is 16-bits wide and consists of the following fields: the data type tag (T) that indicates whether the data item sent is a vector or a scalar; the number of the destination processor module (P); the number of the destination actor (A); the number of the actor input to which the token is directed (I). In this way, parameters for explicit data transfer consist of two words: a 16-bits token and a pointer to a data structure to be sent to the destination. The format of a destination list entry is shows in Fig.2. 65 Implicit data transfer is used for local data flow between actors within a processing module, and it is implemented through shared variables. Tokens are used for synchronization of parallel processes in processing modules. Tokens that follow data, form a control flow since each token indicates which destination actor can be performed after a source actor. The actor is ready when it has collected tokens on all its inputs. The amount of token traffic can be less than that of data flow. For example, when a source actor sends more than one data item to a destination actor, it is enough to send only one token after all data are sent in the vector form. If data is directed to remote actors located on the same remote node, data and all tokens are sent in one message. Token flow and data flow are controlled by a distributed software run-time system together with the auxiliary hardware Control Unit (CU) in each processing node. The auxiliary Control Unit consists of a register file (RF), a table memory (TM), a operational unit (OU), a ready-actor queue (RAQ), microprogrammed controller (MPC), address selector (AS) and bus adapters (BA). The table memory is split into three regions that are used to store three tables: the table of starting addresses; the table of ready-bit vectors, and the table of destination addresses. A structure of CU is shown in Fig.3. 66 Fig.3. The auxiliary Control Unit receives a token and sets an appropriate ready-bit in the read-bit table to indicate that the token has arrived. If the destination actor is ready, its number is inserted to the ready actor queue. On a request from the software kernel, CU fetches a ready actor (if any) from the head of the queue, and looks up the starting address table. A starting address of the ready actor is returned to the kernel that invokes the actor. After the actor completes, the kernel resets actor’s ready-bit vector masked by a always-ready mask, and sends results and tokens to destination actors according to a destination list stored in the destination address table of CU. Sending of data and tokens to remote processing modules is performed by a I/O co-processor1 and communication adapters. A message that includes data (vector or scalar), is passed to the destination processing unit via DMA. When data transmission completes, a corresponding token is passed to the auxiliary Control Unit of the destination processing module. A DMA engine is controlled by the IO co-processor. The VSPD-1 system can be used, in particular, as a functional accelerator for a minicomputer with a PDP-11 architecture for applications that require high performance computing, such as simulation of complex dynamic objects in real-time. 10.6 Let us Sum Up In this lesson we have learnt about a) Sequential and Parallel processing b) Array processors c) and data flow computers 10.7 Points for discussion Discuss about a) array or vector proceesor b) Discuss the working of a data flow computers 10.8 Model Answer to check your progress The answer for the question a in section 10.8 can be written as A vector processor, or array processor, is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. This is in contrast to a scalar processor which handles one element at a time. The vast majority of CPUs are scalar (or close to it). Vector processors were common in the scientific computing area, where they formed the basis of most supercomputers through the 1980s and into the 1990s, but general increases in performance and processor design saw the near disappearance of the vector processor as a general-purpose CPU. Today most commodity CPU designs include some vector processing instructions, typically known as SIMD (Single Instruction, Multiple Data), common examples include SSE and AltiVec. Modern video game consoles and consumer computer-graphics hardware rely heavily on vector processing in their architecture. In 2000, IBM, Toshiba and Sony collaborated to create a Cell processor, consisting of one scalar processor and eight vector processors, for the Sony PlayStation 3. 10.9 Lesson - end Activities 67 After learning this chapter, try to discuss among your friends and answer these questions to check your progress. b) Discuss about sequential processing c) Discuss about Parallel processing 68 10.10 References Charles Crowley, Chapter 6 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 H.M. Deitel, Chapter 11 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 9, 10, 11, 12, 13 of “Modern Operating Systems”, PHI, 1996 D.M. Dhamdhere, Chapter 19 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 69 LESSON – 11: MULTIPROCESSING AND FAULT TOLERANCE CONTENTS 11.1 Aims and Objectives 11.2 Introduction to process 11.3 Multiprocessing techniques 11.3.1 Instruction and data streams 11.3.2 Processor coupling 11.3.4 SISD multiprocessing 11.3.5 SIMD multiprocessing 11.3.6 MISD multiprocessing 11.3.7 MIMD multiprocessing 11.4 Fault-tolerant 11.4.1 Fault Tolerance Requirements 11.4.2 Fault-tolerance by replication, Redundancy and Diversity 11.5 Let us sum up 11.6 Points for discussion 11.7 Model answers to Check your Progress 11.8 Lesson - end activities 11.9 References 11.1 Aims and Objectives In this lesson we will learn about the introduction to multiprocessing and fault tolerance. The objectives of this lesson is make the students aware of the following: a) b) c) d) e) f) Multiprocessing techniques Instruction and data streams Processor coupling SISD, SIMD, MISD, MIMD Fault-tolerant, its requirements and Fault-tolerance by replication, Redundancy and Diversity 70 11.2 Introduction Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them. There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined (multiple cores on one die, multiple chips in one package, multiple packages in one system unit, etc.). Multiprocessing sometimes refers to the execution of multiple concurrent software processes in a system as opposed to a single process at any one instant. However, the term multiprogramming is more appropriate to describe this concept, which is implemented mostly in software, whereas multiprocessing is more appropriate to describe the use of multiple hardware CPUs. A system can be both multiprocessing and multiprogramming, only one of the two, or neither of the two. In a multiprocessing system, all CPUs may be equal, or some may be reserved for special purposes. A combination of hardware and operating-system software design considerations determine the symmetry (or lack thereof) in a given system. For example, hardware or software considerations may require that only one CPU respond to all hardware interrupts, whereas all other work in the system may be distributed equally among CPUs; or execution of kernel-mode code may be restricted to only one processor (either a specific processor, or only one processor at a time), whereas user-mode code may be executed in any combination of processors. Multiprocessing systems are often easier to design if such restrictions are imposed, but they tend to be less efficient than systems in which all CPUs are utilized equally. 11.3 Multiprocessing techniques Systems that treat all CPUs equally are called symmetric multiprocessing (SMP) systems. In systems where all CPUs are not equal, system resources may be divided in a number of ways, including asymmetric multiprocessing (ASMP), non-uniform memory access (NUMA) multiprocessing, and clustered multiprocessing. 11.3.1 Instruction and data streams In multiprocessing, the processors can be used to execute a single sequence of instructions in multiple contexts (single-instruction, multiple-data or SIMD, often used in vector processing), multiple sequences of instructions in a single context (multipleinstruction, single-data or MISD, used for redundancy in fail-safe systems and sometimes applied to describe pipelined processors or hyperthreading), or multiple sequences of instructions in multiple contexts (multiple-instruction, multiple-data or MIMD). 71 11.3.2 Processor coupling Tightly-coupled multiprocessor systems contain multiple CPUs that are connected at the bus level. These CPUs may have access to a central shared memory (SMP or UMA), or may participate in a memory hierarchy with both local and shared memory (NUMA). The IBM p690 Regatta is an example of a high end SMP system. Intel Xeon processors dominated the multiprocessor market for business PCs and were the only x86 option till the release of AMD's Opteron range of processors in 2004. Both ranges of processors had their own onboard cache but provided access to shared memory; the Xeon processors via a common pipe and the Opteron processors via independent pathways to the system RAM. Chip multiprocessors, also known as multi-core computing, involves more than one processor placed on a single chip and can be thought of the most extreme form of tightlycoupled multiprocessing. Mainframe systems with multiple processors are often tightlycoupled. Loosely-coupled multiprocessor systems (often referred to as clusters) are based on multiple standalone single or dual processor commodity-computers interconnected via a high speed communication system (Gigabit Ethernet is common). A Linux Beowulf cluster is an example of a loosely-coupled system. Tightly-coupled systems perform better and are physically smaller than looselycoupled systems, but have historically required greater initial investments and may depreciate rapidly; nodes in a loosely-coupled system are usually inexpensive commodity computers and can be recycled as independent machines upon retirement from the cluster. Power consumption is also a consideration. Tightly-coupled systems tend to be much more energy efficient than clusters. This is because considerable economies can be realized by designing components to work together from the beginning in tightly-coupled systems, whereas loosely-coupled systems use components that were not necessarily intended specifically for use in such systems. 11.3.3 SISD multiprocessing All processors of 8-bit and 16-bit instruction set are SISD mutliprocessors. 72 11.3.4 SIMD multiprocessing SIMD multiprocessing is well suited to parallel or vector processing, in which a very large set of data can be divided into parts that are individually subjected to identical but independent operations. A single instruction stream directs the operation of multiple processing units to perform the same manipulations simultaneously on potentially large amounts of data. For certain types of computing applications, this type of architecture can produce enormous increases in performance, in terms of the elapsed time required to complete a given task. However, a drawback to this architecture is that a large part of the system falls idle when applications or system tasks are executed that cannot be divided into units that can be processed in parallel. Additionally, applications must be carefully and specially written to take maximum advantage of the architecture, and often special optimizing compilers designed to produce code specifically for this environment must be used. Some compilers in this category provide special constructs or extensions to allow programmers to directly specify operations to be performed in parallel (e.g., DO FOR ALL statements in the version of FORTRAN used on the ILLIAC IV, which was a SIMD multiprocessing supercomputer). SIMD multiprocessing finds wide use in certain domains such as computer simulation, but is of little use in general-purpose desktop and business computing environments. 11.3.5 MISD multiprocessing MISD multiprocessing offers mainly the advantage of redundancy, since multiple processing units perform the same tasks on the same data, reducing the chances of incorrect results if one of the units fails. MISD architectures may involve comparisons between processing units to detect failures. Apart from the redundant and fail-safe character of this type of multiprocessing, it has few advantages, and it is very expensive. It does not improve performance. It can be implemented in a way that is transparent to software. 73 11.3.6 MIMD multiprocessing MIMD multiprocessing architecture is suitable for a wide variety of tasks in which completely independent and parallel execution of instructions touching different sets of data can be put to productive use. For this reason, and because it is easy to implement, MIMD predominates in multiprocessing. Processing is divided into multiple threads, each with its own hardware processor state, within a single software-defined process or within multiple processes. Insofar as a system has multiple threads awaiting dispatch (either system or user threads), this architecture makes good use of hardware resources. MIMD does raise issues of deadlock and resource contention, however, since threads may collide in their access to resources in an unpredictable way that is difficult to manage efficiently. MIMD requires special coding in the operating system of a computer but does not require application changes unless the applications themselves use multiple threads (MIMD is transparent to single-threaded applications under most operating systems, if the applications do not voluntarily relinquish control to the OS). Both system and user software may need to use software constructs such as semaphores (also called locks or gates) to prevent one thread from interfering with another if they should happen to cross paths in referencing the same data. This gating or locking process increases code complexity, lowers performance, and greatly increases the amount of testing required, although not usually enough to negate the advantages of multiprocessing. 74 Similar conflicts can arise at the hardware level between CPUs (cache contention and corruption, for example), and must usually be resolved in hardware, or with a combination of software and hardware (e.g., cache-clear instructions). 11.4 Fault-tolerant Fault-tolerant describes a computer system or component designed so that, in the event that a component fails, a backup component or procedure can immediately take its place with no loss of service. Fault tolerance can be provided with software, or embedded in hardware, or provided by some combination. In the software implementation, the operating system provides an interface that allows a programmer to "checkpoint" critical data at pre-determined points within a transaction. In the hardware implementation (for example, with Stratus and its VOS operating system), the programmer does not need to be aware of the fault-tolerant capablilities of the machine. At a hardware level, fault tolerance is achieved by duplexing each hardware component. Disks are mirrored. Multiple processors are "lock-stepped" together and their outputs are compared for correctness. When an anomaly occurs, the faulty component is determined and taken out of service, but the machine continues to function as usual. Fault-tolerance or graceful degradation is the property that enables a system (often computer-based) to continue operating properly in the event of the failure of (or one or more faults within) some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively-designed system in which even a small failure can cause total breakdown. Fault-tolerance is particularly sought-after in high-availability or life-critical systems. Fault-tolerance is not just a property of individual machines; it may also characterise the rules by which they interact. For example, the Transmission Control Protocol (TCP) is designed to allow reliable two-way communication in a packet-switched network, even in the presence of communications links which are imperfect or overloaded. It does this by requiring the endpoints of the communication to expect packet loss, duplication, reordering and corruption, so that these conditions do not damage data integrity, and only reduce throughput by a proportional amount. Data formats may also be designed to degrade gracefully. HTML for example, is designed to be forward compatible, allowing new HTML entities to be ignored by Web browsers which do not understand them without causing the document to be unusable. Recovery from errors in fault-tolerant systems can be characterised as either rollforward or roll-back. When the system detects that it has made an error, roll-forward recovery takes the system state at that time and corrects it, to be able to move forward. Roll-back recovery reverts the system state back to some earlier, correct version, for example using checkpointing, and moves forward from there. Roll-back recovery requires that the operations between the checkpoint and the detected erroneous state can be made idempotent. Some systems make use of both roll-forward and roll-back recovery for different errors or different parts of one error. 75 Within the scope of an individual system, fault-tolerance can be achieved by anticipating exceptional conditions and building the system to cope with them, and, in general, aiming for self-stabilization so that the system converges towards an error-free state. However, if the consequences of a system failure are catastrophic, or the cost of making it sufficiently reliable is very high, a better solution may be to use some form of duplication. In any case, if the consequence of a system failure is catastrophic, the system must be able to use reversion to fall back to a safe mode. This is similar to roll-back recovery but can be a human action if humans are present in the loop. 11.4.1 Fault Tolerance Requirements The basic characteristics of fault tolerance require: 1. No single point of failure 2. No single point of repair 3. Fault isolation to the failing component 4. Fault containment to prevent propagation of the failure 5. Availability of reversion modes In addition, fault tolerant systems are characterized in terms of both planned service outages and unplanned service outages. These are usually measured at the application level and not just at a hardware level. The figure of merit is called availability and is expressed as a percentage. A five nines system would therefore statistically provide 99.999% availability. 11.4.2 Fault-tolerance by replication, Redundancy and Diversity Spare components addresses the first fundamental characteristic of fault-tolerance in three ways: Replication: Providing multiple identical instances of the same system or subsystem, directing tasks or requests to all of them in parallel, and choosing the correct result on the basis of a quorum; Redundancy: Providing multiple identical instances of the same system and switching to one of the remaining instances in case of a failure (failover); Diversity: Providing multiple different implementations of the same specification, and using them like replicated systems to cope with errors in a specific implementation. A redundant array of independent disks (RAID) is an example of a fault-tolerant storage device that uses data redundancy. A lockstep fault-tolerant machine uses replicated elements operating in parallel. At any time, all the replications of each element should be in the same state. The same inputs are provided to each replication, and the same outputs are expected. The outputs of the replications are compared using a voting circuit. A machine with two replications of each element is termed Dual Modular Redundant (DMR). The voting circuit can then only detect a 76 mismatch and recovery relies on other methods. A machine with three replications of each element is termed Triple Modular Redundancy (TMR). The voting circuit can determine which replication is in error when a two-to-one vote is observed. In this case, the voting circuit can output the correct result, and discard the erroneous version. After this, the internal state of the erroneous replication is assumed to be different from that of the other two, and the voting circuit can switch to a DMR mode. This model can be applied to any larger number of replications. Lockstep fault tolerant machines are most easily made fully synchronous, with each gate of each replication making the same state transition on the same edge of the clock, and the clocks to the replications being exactly in phase. However, it is possible to build lockstep systems without this requirement. Bringing the replications into synchrony requires making their internal stored states the same. They can be started from a fixed initial state, such as the reset state. Alternatively, the internal state of one replica can be copied to another replica. One variant of DMR is pair-and-spare. Two replicated elements operate in lockstep as a pair, with a voting circuit that detects any mismatch between their operations and outputs a signal indicating that there is an error. Another pair operates exactly the same way. A final circuit selects the output of the pair that does not proclaim that it is in error. Pair-and-spare requires four replicas rather than the three of TMR, but has been used commercially. 11.5 Let us sum up In this lesson we have learnt about a) the multiprocessing techniques b) and fault tolerance 11.6 Points for discussion Discuss the following a) SISD b) SIMD c) MISD d) MIMD 11.7 Model answers to check your progress The questions given in section 11.7 are explained here. In computing, SISD (Single Instruction, Single Data) is a term referring to an architecture in which a single processor executes a single instruction stream, to operate on data stored in a single memory. Corresponds to the von Neumann architecture. In computing, SIMD (Single Instruction, Multiple Data) is a technique employed to achieve data level parallelism, like with vector processor. First made popular in large-scale 77 supercomputers (contrary to MIMD parallelization), smaller-scale SIMD operations have now become widespread in personal computer hardware. In computing, MISD (Multiple Instruction, Single Data) is a type of parallel computing architecture where many functional units perform different operations on the same data. Pipeline architectures belong to this type, though a purist might say that the data is different after processing by each stage in the pipeline. Fault-tolerant computers executing the same instructions redundantly in order to detect and mask errors, in a manner known as task replication, may] be considered to belong to this type. Not many instantiations of this architecture exist, as MIMD and SIMD are often more appropriate for common data parallel techniques. Specifically, they allow better scaling and use of computational resources than MISD does. In computing, MIMD (Multiple Instruction stream, Multiple Data stream) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data. MIMD architectures may be used in a number of application areas such as computer-aided design/computer-aided manufacturing, simulation, modeling, and as communication switches. MIMD machines can be of either shared memory or distributed memory categories. These classifications are based on how MIMD processors access memory. Shared memory machines may be of the busbased, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection schemes. 11.8 Lesson - end activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. b) Discuss about various multiprocessing techniques c) Discuss about fault tolerance 11.9. References a) Charles Crowley, Chapter 5, 6 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 b) H.M. Deitel, Chapter 11 of “Operating Systems”, Second Edition, Pearson Education, 2001 c) Andrew S. Tanenbaum, Chapter 9, 10, 11, 12, 13 of “Modern Operating Systems”, PHI, 1996 d) D.M. Dhamdhere, Chapter 18, 19 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 78 UNIT – IV LESSON – 12: DEVICE AND DISK MANAGEMENT CONTENTS 12.1 Aims and Objectives 12.2 Introduction 12.3 Need for Disk Scheduling 12.4 Disk Scheduling Strategies 12.4.1 First Come First Served (FCFS) 12.4.2 Shortest Seek Time First (SSTF) 12.4.3 SCAN 12.4.4 Circular SCAN (C-SCAN) 12.5 RAM(Random access memory) 12.5.1 Overview 12.5.2 Recent developments 12.5.3 Memory wall 12.5.4 DRAM packaging 12.6 Optical Disks 12.7 Let us sum up 12.8 Points for discussion 12.9 Model answers to Check your Progress 12.10 Lesson - end activities 12.11 References 2.2 Aims and Objectives In this lesson we will learn about the introduction to Disk Scheduling Strategies, RAM and optical disks. The objective of this lesson is to make sure that the student understands the following. a) Disk Scheduling, b) First Come First Served (FCFS), c) Shortest Seek Time First (SSTF) d) SCAN e) Circular SCAN (C-SCAN) 79 f) RAM and g) Optical Disks 12.2 Introduction In multiprogramming systems several different processes may want to use the system's resources simultaneously. For example, processes will contend to access an auxiliary storage device such as a disk. The disk drive needs some mechanism to resolve this contention, sharing the resource between the processes fairly and efficiently. A magnetic disk consists of a collection of platters which rotate on about a central spindle. These platters are metal disks covered with magnetic recording material on both sides. Each disk surface is divided into concentric circles called tracks. Disk divides each track into sectors, each typically contains 512 bytes. While reading and writing the head moves over the surface of the platters until it finds the track and sector it requires. This is like finding someone's home by first finding the street (track) and then the particular house number (sector). There is one head for each surface on which information is stored each on its own arm. In most systems the arms are connected together so that the heads move in unison, so that each head is over the same track on each surface. The term cylinder refers to the collection of all tracks which are under the heads at any time. In order to satisfy an I/O request the disk controller must first move the head to the correct track and sector. Moving the head between cylinders takes a relatively long time so in order to maximize the number of I/O requests which can be satisfied the scheduling policy should try to minimize the movement of the head. On the other hand, minimizing head movement by always satisfying the request of the closest location may mean that some requests have to wait a long time. Thus, there is a trade-off between throughput (the average number of requests satisfied in unit time) and response time (the average time between a request arriving and it being satisfied). 12.3 Need for Disk Scheduling Access time has two major components namely seek time and rotational latency. Seek time is the time for the disk are to move the heads to the cylinder containing the desired sector. Rotational latency is the additional time waiting for the disk to rotate the desired 80 sector to the disk head. In order to have fast access time we have to minimize the seek time which is approximately equal to the seek distance. Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer. The operating system is responsible for using hardware efficiently for the disk drives, to have a fast access time and disk bandwidth. This in turn needs a good disk scheduling. 12.4 Disk Scheduling Strategies Three criteria to measure strategies are throughput, mean response time, and variance of response times. Throughput is the number of requests serviced per unit of time. Mean response time is the average time spent waiting for request to be serviced. Variance of response times is the measure of the predictability of response times. Hence the overall goals of the disk scheduling strategies are to maximize the throughput and minimize both response time and variance of response time. 12.4.1 First Come First Served (FCFS) The disk controller processes the I/O requests in the order in which they arrive, thus moving backwards and forwards across the surface of the disk to get to the next requested location each time. Since no reordering of request takes place the head may move almost randomly across the surface of the disk. This policy aims to minimize response time with little regard for throughput. Figure illustrates this method. The figure depicts requests for 63, 33, 72, 47, 8, 99, 74, 52, 75. If the requests have arrived in the sequence, they are also serviced in that sequence, causing the head movement as shown in the figure. FCFS is a 'just' algorithm, because, the process to make a request first is served first, but it may not be the best in terms of reducing the head movement, as is clear from the figure. 81 Seek pattern under the FCFS strategy. Disk request pattern. First-come-first-served (FCFS) scheduling has major drawbacks namely, (i) Seeking to randomly distributed locations results in long waiting times, (ii) Under heavy loads, system can become overwhelmed, (iii) Requests must be serviced in logical order to minimize delays, (iv) and Service requests with least mechanical motion 12.4.2 Shortest Seek Time First (SSTF) Each time an I/O request has been completed the disk controller selects the waiting request whose sector location is closest to the current position of the head. The movement across the surface of the disk is still apparently random but the time spent in movement is minimized. This policy will have better throughput than FCFS but a request may be delayed for a long period if many closely located requests arrive just after it. 82 Seek pattern under the SSTF strategy Advantages of SSTF are higher throughput and lower response times than FCFS and it is a reasonable solution for batch processing systems. The disadvantages of SSTF are (i) it does does not ensure fairness, (ii) there are possibility of indefinite postponement, (iii) there will be high variance of response times and (iv) the response time generally will be unacceptable for interactive systems. 12.4.3 SCAN The drive head sweeps across the entire surface of the disk, visiting the outermost cylinders before changing direction and sweeping back to the innermost cylinders. It selects the next waiting requests whose location it will reach on its path backwards and forwards across the disk. Thus, the movement time should be less than FCFS but the policy is clearly fairer than SSTF. 12.4.4 Circular SCAN (C-SCAN) C-SCAN is similar to SCAN but I/O requests are only satisfied when the drive head is traveling in one direction across the surface of the disk. The head sweeps from the innermost cylinder to the outermost cylinder satisfying the waiting requests in order of their locations. When it reaches the outermost cylinder it sweeps back to the innermost cylinder without satisfying any requests and then starts again. 12.5 RAM(Random access memory) Random access memory (usually known by its acronym, RAM) is a type of data storage used in computers. It takes the form of integrated circuits that access the stored data to be accessed in any order — that is, at random and without the physical movement of the storage medium or a physical reading head. RAM is a volatile memory as the information or instructions stored in it will be lost if the power is switched off. The word "random" refers to the fact that any piece of data can be returned in a constant time, regardless of its physical location and whether or not it is related to the previous piece of data. This contrasts with storage mechanisms such as tapes, magnetic discs and optical discs, which rely on the physical movement of the recording medium or a reading 83 head. In these devices, the movement takes longer than the data transfer, and the retrieval time varies depending on the physical location of the next item. A 1GB DDR RAM memory RAM is usually writable as well as readable, so "RAM" is often used interchangeably with "read-write memory". The alternative to this is "ROM", or Read Only Memory. Most types of RAM lose their data when the computer powers down. "Flash memory" is a ROM/RAM hybrid that can be written to, but which does not require power to maintain its contents. RAM is not strictly the opposite of ROM, however. The word random indicates a contrast with serial access or sequential access memory. "Random access" is also the name of an indexing method: hence, disk storage is often called "random access" because the reading head can move relatively quickly from one piece of data to another, and does not have to read all the data in between. However the final "M" is crucial: "RAM" (provided there is no additional term as in "DVD-RAM") always refers to a solid-state device. Many CPU-based designs actually have a memory hierarchy consisting of registers, on-die SRAM caches, DRAM, paging systems, and virtual memory or swap space on a harddrive. This entire pool of memory may be referred to as "RAM" by many developers, even though the various subsystems can have very different access times, violating the original concept behind the "random access" term in RAM. Even within a hierarchy level such as DRAM, the specific row/column/bank/rank/channel/interleave organization of the components make the access time variable, although not to the extent that rotating storage media or a tape is variable. 12.5.1 Overview The key benefit of RAM over types of storage which require physical movement is that retrieval times are short and consistent. Short because no physical movement is necessary, and consistent because the time taken to retrieve a piece of data does not depend on its current distance from a physical head; it requires practically the same amount of time to access any piece of data stored in a RAM chip. Most other technologies have inherent delays for reading a particular bit or byte. The disadvantage of RAM over physically moving media is cost, and the loss of data when power is turned off. Because of this speed and consistency, RAM is used as 'main memory' or primary storage: the working area used for loading, displaying and manipulating applications and data. In most personal computers, the RAM is not an integral part of the motherboard or 84 CPU—it comes in the easily upgraded form of modules called memory sticks or RAM sticks about the size of a few sticks of chewing gum. These can quickly be removed and replaced. A smaller amount of random-access memory is also integrated with the CPU, but this is usually referred to as "cache" memory, rather than RAM. Modern RAM generally stores a bit of data as either a charge in a capacitor, as in dynamic RAM, or the state of a flip-flop, as in static RAM. Some types of RAM can detect or correct random faults called memory errors in the stored data, using RAM parity and error correction codes. Many types of RAM are volatile, which means that unlike some other forms of computer storage such as disk storage and tape storage, they lose all data when the computer is powered down. For these reasons, nearly all PCs use disks as "secondary storage". Software can "partition" a portion of a computer's RAM, allowing it to act as a much faster hard drive that is called a RAM disk. Unless the memory used is non-volatile, a RAM disk loses the stored data when the computer is shut down. However, volatile memory can retain its data when the computer is shut down if it has a separate power source, usually a battery. If a computer becomes low on RAM during intensive application cycles, the computer can resort to so-called virtual memory. In this case, the computer temporarily uses hard drive space as additional memory. Constantly relying on this type of backup memory it is called thrashing, which is generally undesirable, as virtual memory lacks the advantages of RAM. In order to reduce the dependency on virtual memory, more RAM can be installed. 12.5.2 Recent developments Currently, several types of non-volatile RAM are under development, which will preserve data while powered down. The technologies used include carbon nanotubes and the magnetic tunnel effect. In summer 2003, a 128 KB magnetic RAM chip manufactured with 0.18 µm technology was introduced. The core technology of MRAM is based on the magnetic tunnel effect. In June 2004, Infineon Technologies unveiled a 16 MB prototype again based on 0.18 µm technology. Nantero built a functioning carbon nanotube memory prototype 10 GB array in 2004. In 2006, Solid state memory came of age, especially when implemented as "Solid state disks", with capacities exceeding 150 gigabytes and speeds far exceeding traditional disks. This development has started to blur the definition between traditional random access memory and disks, dramatically reducing the difference in performance. 12.5.3 Memory wall The "memory wall" is the growing disparity of speed between CPU and memory outside the CPU chip. An important reason of this disparity is the limited communication bandwidth beyond chip boundaries. From 1986 to 2000, CPU speed improved at an annual rate of 55% while memory speed only improved at 10%. Given these trends, it was expected 85 that memory latency would become an overwhelming bottleneck in computer performance. Currently, CPU speed improvements have slowed significantly partly due to major physical barriers and partly because current CPU designs have already hit the memory wall in some sense. Intel summarized these causes in their Platform 2015 documentation “First of all, as chip geometries shrink and clock frequencies rise, the transistor leakage current increases, leading to excess power consumption and heat (more on power consumption below). Secondly, the advantages of higher clock speeds are in part negated by memory latency, since memory access times have not been able to keep pace with increasing clock frequencies. Third, for certain applications, traditional serial architectures are becoming less efficient as processors get faster (due to the so-called Von Neumann bottleneck), further undercutting any gains that frequency increases might otherwise buy. In addition, resistancecapacitance (RC) delays in signal transmission are growing as feature sizes shrink, imposing an additional bottleneck that frequency increases don't address.” The RC delays in signal transmission were also noted in Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures which projects a maximum of 12.5% average annual CPU performance improvement between 2000 and 2014. The data on Intel Processors clearly shows a slowdown in performance improvements in recent processors. However, Intel's new processors, Core 2 Duo (codenamed Conroe) show a significant improvement over previous Pentium 4 processors; due to a more efficient architecture, performance increased while clock rate actually decreased. 12.5.4 DRAM packaging For economic reasons, the large (main) memories found in personal computers, workstations, and non-handheld game-consoles (such as Playstation and Xbox) normally consists of dynamic RAM (DRAM). Other parts of the computer, such as cache memories and data buffers in hard disks, normally use static RAM (SRAM). 12.6 Optical Disks An optical disc is an electronic data storage medium that can be written to and read using a low-powered laser beam. Originally developed in the late 1960s, the first optical disc, created by James T. Russell, stored data as micron-wide dots of light and dark. A laser read the dots, and the data was converted to an electrical signal, and finally to audio or visual output. However, the technology didn't appear in the marketplace until Philips and Sony came out with the compact disc (CD) in 1982. Since then, there has been a constant succession of optical disc formats, first in CD formats, followed by a number of DVD formats. Optical disc offers a number of advantages over magnetic storage media. An optical disc holds much more data. The greater control and focus possible with laser beams (in comparison to tiny magnetic heads) means that more data can be written into a smaller space. Storage capacity increases with each new generation of optical media. Emerging standards, such as Blu-ray, offer up to 27 gigabytes (GB) on a single-sided 12-centimeter disc. In comparison, a diskette, for example, can hold 1.44 megabytes (MB). Optical discs are inexpensive to manufacture and data stored on them is relatively impervious to most environmental threats, such as power surges, or magnetic disturbances. 86 Optical disc offers a number of advantages over magnetic storage media. An optical disc holds much more data. The greater control and focus possible with laser beams (in comparison to tiny magnetic heads) means that more data can be written into a smaller space. Storage capacity increases with each new generation. 12.7 Let us sum up In this lesson we have learnt about a) the various disk scheduling strategies b) the Random access memory c) and the optical disk 12.8 Points for discussion a) Compare disk scheduling algorithms 12.9 Model answers to Check your Progress The answer for the question given in section 12.9 is discussed here for helping the students to check their progress. FCFS (First Come, First Served) o perform operations in order requested o no reordering of work queue o no starvation: every request is serviced o poor performance SSTF (Shortest Seek Time First) o after a request, go to the closest request in the work queue, regardless of direction o reduces total seek time compared to FCFS o Disadvantages starvation is possible; stay in one area of the disk if very busy switching directions slows things down SCAN o go from the outside to the inside servicing requests and then back from the outside to the inside servicing requests. o repeats this over and over. o reduces variance compared to SSTF. LOOK o like SCAN but stops moving inwards (or outwards) when no more requests in that direction exist. C-SCAN (circular scan) o moves inwards servicing requests until it reaches the innermost cylinder; then jumps to the outside cylinder of the disk without servicing any requests. o repeats this over and over. o variant: service requests from inside to outside, and then skip back to the innermost cylinder. C-LOOK o moves inwards servicing requests until there are no more requests in that direction, then it jumps to the outermost outstanding requests. 87 o repeast this over and over. o variant: service requests from inside to outside, then skip back to the innermost request. 12.10 Lesson - end activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. a) b) c) d) e) f) Discuss about various disk scheduling strategies Explain about FCFS Explain about SSTF Explain about SCAN Discuss about RAM Discuss about Optical storages 12.11 References Charles Crowley, Chapter 15 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 H.M. Deitel, Chapter 12 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 5 of “Modern Operating Systems”, PHI, 1996 D.M. Dhamdhere, Chapter 16, 17 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 88 LESSON – 13: FILE SYSTEMS AND ORGANIZATION CONTENTS 13.1 Aims and Objectives 13.2 Introduction 13.2.1 Aspects of file systems 13.3 Types of file systems 13.3.1 Disk file systems 13.3.2 Flash file systems 13.3.3 Database file systems 13.3.4 Transactional file systems 13.3.5 Network file systems 13.3.6 Special purpose file systems 13.3.7 Flat file systems 13.4 File systems and operating systems 13.5 Multiple Filesystem Support 13.6 Organization 13.7 Examples of file systems 13.7.1 File systems under Unix and Linux systems 13.7.2 File systems under Mac OS X 13.7.3 File systems under Plan 9 from Bell Labs 13.7.4 File systems under Microsoft Windows 13.8 Let us sum up 13.9 Points for Discussion 13.10 Model Answers to Check your Progress 13.11 Lesson - end activities 13.12 References 13.1 Aims and Objectives In this lesson we will learn the basics, history, versions and environment of DOS. This lesson also covers the basic definitions and history of UNIX. The objectives of this lesson is to make the student aware of the basic concepts and history of both DOS and UNIX 89 13.1 Introduction In computing, a file system (often also written as filesystem) is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. File systems may use a data storage device such as a hard disk or CD-ROM and involve maintaining the physical location of the files, they might provide access to data on a file server by acting as clients for a network protocol (e.g., NFS, SMB, or 9P clients), or they may be virtual and exist only as an access method for virtual data. More formally, a file system is a set of abstract data types that are implemented for the storage, hierarchical organization, manipulation, navigation, access, and retrieval of data. File systems share much in common with database technology, but it is debatable whether a file system can be classified as a special-purpose database (DBMS). 1.3.2 Aspects of file systems The most familiar file systems make use of an underlying data storage device that offers access to an array of fixed-size blocks, sometimes called sectors, generally 512 bytes each. The file system software is responsible for organizing these sectors into files and directories, and keeping track of which sectors belong to which file and which are not being used. However, file systems need not make use of a storage device at all. A file system can be used to organize and represent access to any data, whether it be stored or dynamically generated (eg, from a network connection). Whether the file system has an underlying storage device or not, file systems typically have directories which associate file names with files, usually by connecting the file name to an index into a file allocation table of some sort, such as the FAT in an MS-DOS file system, or an inode in a Unix-like file system. Directory structures may be flat, or allow hierarchies where directories may contain subdirectories. In some file systems, file names are structured, with special syntax for filename extensions and version numbers. In others, file names are simple strings, and per-file metadata is stored elsewhere. Other bookkeeping information is typically associated with each file within a file system. The length of the data contained in a file may be stored as the number of blocks allocated for the file or as an exact byte count. The time that the file was last modified may be stored as the file's timestamp. Some file systems also store the file creation time, the time it was last accessed, and the time that the file's meta-data was changed. (Note that many early PC operating systems did not keep track of file times.) Other information can include the file's device type (e.g., block, character, socket, subdirectory, etc.), its owner user-ID and group-ID, and its access permission settings (e.g., whether the file is read-only, executable, etc.). The hierarchical file system was an early research interest of Dennis Ritchie of Unix fame; previous implementations were restricted to only a few levels, notably the IBM implementations, even of their early databases like IMS. After the success of Unix, Ritchie extended the file system concept to every object in his later operating system developments, such as Plan 9 and Inferno. 90 Traditional file systems offer facilities to create, move and delete both files and directories. They lack facilities to create additional links to a directory (hard links in Unix), rename parent links (".." in Unix-like OS), and create bidirectional links to files. Traditional file systems also offer facilities to truncate, append to, create, move, delete and in-place modify files. They do not offer facilities to prepend to or truncate from the beginning of a file, let alone arbitrary insertion into or deletion from a file. The operations provided are highly asymmetric and lack the generality to be useful in unexpected contexts. For example, interprocess pipes in Unix have to be implemented outside of the file system because the pipes concept does not offer truncation from the beginning of files. Secure access to basic file system operations can be based on a scheme of access control lists or capabilities. Research has shown access control lists to be difficult to secure properly, which is why research operating systems tend to use capabilities. Commercial file systems still use access control lists. Arbitrary attributes can be associated on advanced file systems, such as XFS, ext2/ext3, some versions of UFS, and HFS+, using extended file attributes. This feature is implemented in the kernels of Linux, FreeBSD and Mac OS X operating systems, and allows metadata to be associated with the file at the file system level. This, for example, could be the author of a document, the character encoding of a plain-text document, or a checksum. 1.4 Types of file systems File system types can be classified into disk file systems, network file systems and special purpose file systems. 1.4.1 Disk file systems A disk file system is a file system designed for the storage of files on a data storage device, most commonly a disk drive, which might be directly or indirectly connected to the computer. Examples of disk file systems include FAT, FAT32, NTFS, HFS and HFS+, ext2, ext3, ISO 9660, ODS-5, and UDF. Some disk file systems are journaling file systems or versioning file systems. 1.4.2 Flash file systems A flash file system is a file system designed for storing files on flash memory devices. These are becoming more prevalent as the number of mobile devices is increasing, and the capacity of flash memories catches up with hard drives. While a block device layer can run emulate hard drive behavior and store regular file systems on a flash device, this is suboptimal for several reasons: Erasing blocks: Flash memory blocks have to be explicitly erased before they can be written to. The time taken to erase blocks can be significant, thus it is beneficial to erase unused blocks while the device is idle. 91 Random access: Disk file systems are optimized to avoid disk seeks whenever possible, due to the high cost of seeking. Flash memory devices impose no seek latency. Wear levelling: Flash memory devices tend to "wear out" when a single block is repeatedly overwritten; flash file systems try to spread out writes as evenly as possible. It turns out that log-structured file systems have all the desirable properties for a flash file system. Such file systems include JFFS2 and YAFFS. 1.4.3 Database file systems A new concept for file management is the concept of a database-based file system. Instead of, or in addition to, hierarchical structured management, files are identified by their characteristics, like type of file, topic, author, or similar metadata. 1.4.4 Transactional file systems This is a special kind of file system in that it logs events or transactions to files. Each operation that you do may involve changes to a number of different files and disk structures. In many cases, these changes are related, meaning that it is important that they all be executed at the same time. Take for example a bank sending another bank some money electronically. The bank's computer will "send" the transfer instruction to the other bank and also update its own records to indicate the transfer has occurred. If for some reason the computer crashes before it has had a chance to update its own records, then on reset, there will be no record of the transfer but the bank will be missing some money. A transactional system can rebuild the actions by resynchronizing the "transactions" on both ends to correct the failure. All transactions can be saved as well, providing a complete record of what was done and where. This type of file system is designed and intended to be fault tolerant, and necessarily incurs a high degree of overhead. 92 1.4.5 Network file systems A network file system is a file system that acts as a client for a remote file access protocol, providing access to files on a server. Examples of network file systems include clients for the NFS, SMB protocols, and file-system-like clients for FTP and WebDAV. 1.4.6 Special purpose file systems A special purpose file system is basically any file system that is not a disk file system or network file system. This includes systems where the files are arranged dynamically by software, intended for such purposes as communication between computer processes or temporary file space. Special purpose file systems are most commonly used by file-centric operating systems such as Unix. Examples include the procfs (/proc) file system used by some Unix variants, which grants access to information about processes and other operating system features. Deep space science exploration craft, like Voyager I & II used digital tape based special file systems. Most modern space exploration craft like Cassini-Huygens used Real-time operating system file systems or RTOS influenced file systems. The Mars Rovers are one such example of an RTOS file system, important in this case because they are implemented in flash memory. 1.4.7 Flat file systems In a flat file system, there are no subdirectories—everything is stored at the same (root) level on the media, be it a hard disk, floppy disk, etc. While simple, this system rapidly becomes inefficient as the number of files grows, and makes it difficult for users to organise data into related groups. Like many small systems before it, the original Apple Macintosh featured a flat file system, called Macintosh File System. Its version of Mac OS was unusual in that the file management software (Macintosh Finder) created the illusion of a partially hierarchical filing system on top of MFS. This structure meant that every file on a disk had to have a unique name, even if it appeared to be in a separate folder. MFS was quickly replaced with Hierarchical File System, which supported real directories. 1.5 File systems and operating systems Most operating systems provide a file system, and is an integral part of any modern operating system. Early microcomputer operating systems' only real task was file management — a fact reflected in their names (see DOS). Some early operating systems had a separate component for handling file systems which was called a disk operating system. On some microcomputers, the disk operating system was loaded separately from the rest of the operating system. On early operating systems, there was usually support for only one, native, unnamed file system; for example, CP/M supports only its own file system, which might be called "CP/M file system" if needed, but which didn't bear any official name at all. Because of this, there needs to be an interface provided by the operating system software between the user and the file system. This interface can be textual (such as provided by a 93 command line interface, such as the Unix shell, or OpenVMS DCL) or graphical (such as provided by a graphical user interface, such as file browsers). If graphical, the metaphor of the folder, containing documents, other files, and nested folders is often used (see also: directory and folder). 1.6 Multiple Filesystem Support With the expansion of network computing, it became desirable to support both local and remote filesystems. To simplify the support of multiple filesystems, the developers added a new virtual node or vnode interface to the kernel. The set of operations exported from the vnode interface appear much like the filesystem operations previously supported by the local filesystem. However, they may be supported by a wide range of filesystem types: Local disk-based filesystems Files imported using a variety of remote filesystem protocols Read-only CD-ROM filesystems Filesystems providing special-purpose interfaces - for example, the /proc filesystem A few variants of 4.4BSD, such as FreeBSD, allow filesystems to be loaded dynamically when the filesystems are first referenced by the mount system call. 1.7 Organization 1. A file is organized logically as a sequence of records. 2. Records are mapped onto disk blocks. 3. Files are provided as a basic construct in operating systems, so we assume the existence of an underlying file system. 4. Blocks are of a fixed size determined by the operating system. 5. Record sizes vary. 6. In relational database, tuples of distinct relations may be of different sizes. 7. One approach to mapping database to files is to store records of one length in a given file. 8. An alternative is to structure files to accommodate variable-length records. (Fixedlength is easier to implement.) 94 1.8 Examples of file systems 1.8.1 File systems under Unix and Linux systems Unix operating systems create a virtual file system, which makes all the files on all the devices appear to exist in a single hierarchy. This means, in Unix, there is one root directory, and every file existing on the system is located under it somewhere. Furthermore, the Unix root directory does not have to be in any physical place. It might not be on your first hard drive - it might not even be on your computer. Unix can use a network shared resource as its root directory. Unix assigns a device name to each device, but this is not how the files on that device are accessed. Instead, to gain access to files on another device, you must first inform the operating system where in the directory tree you would like those files to appear. This process is called mounting a file system. For example, to access the files on a CD-ROM, one must tell the operating system "Take the file system from this CD-ROM and make it appear under such-and-such directory". The directory given to the operating system is called the mount point - it might, for example, be /media. The /media directory exists on many Unix systems (as specified in the Filesystem Hierarchy Standard) and is intended specifically for use as a mount point for removable media such as CDs, DVDs and like floppy disks. It may be empty, or it may contain subdirectories for mounting individual devices. Generally, only the administrator (i.e. root user) may authorize the mounting of file systems. Unix-like operating systems often include software and tools that assist in the mounting process and provide it new functionality. Some of these strategies have been coined "auto-mounting" as a reflection of their purpose. 1. In many situations, file systems other than the root need to be available as soon as the operating system has booted. All Unix-like systems therefore provide a facility for mounting file systems at boot time. System administrators define these file systems in the configuration file fstab, which also indicates options and mount points. 2. In some situations, there is no need to mount certain file systems at boot time, although their use may be desired thereafter. There are some utilities for Unix-like systems that allow the mounting of predefined file systems upon demand. 3. Removable media have become very common with microcomputer platforms. They allow programs and data to be transferred between machines without a physical connection. Common examples include USB flash drives, CD-ROMs and DVDs. Utilities have therefore been developed to detect the presence and availability of a medium and then mount that medium without any user intervention. 4. Progressive Unix-like systems have also introduced a concept called supermounting. For example, a floppy disk that has been supermounted can be physically removed from the system. Under normal circumstances, the disk should have been synchronised and then unmounted before its removal. Provided synchronisation has occurred, a different disk can be inserted into the drive. The system automatically notices that the disk has changed and updates the mount point contents to reflect the new medium. Similar functionality is found on standard Windows machines. 95 5. A similar innovation preferred by some users is the use of autofs, a system that, like supermounting, eliminates the need for manual mounting commands. The difference from supermount, other than compatibility in an apparent greater range of applications such as access to file systems on network servers, is that devices are mounted transparently when requests to their file systems are made, as would be appropriate for file systems on network servers, rather than relying on events such as the insertion of media, as would be appropriate for removable media. 1.8.2 File systems under Mac OS X Mac OS X uses a file system that it inherited from Mac OS called HFS Plus. HFS Plus is a metadata-rich and case preserving file system. Due to the Unix roots of Mac OS X, Unix permissions were added to HFS Plus. Later versions of HFS Plus added journaling to prevent corruption of the file system structure and introduced a number of optimizations to the allocation algorithms in an attempt to defragment files automatically without requiring an external defragmenter. Filenames can be up to 255 characters. HFS Plus uses Unicode to store filenames. On Mac OS X, the filetype can come from the type code stored in file's metadata or the filename. HFS Plus has three kinds of links: Unix-style hard links, Unix-style symbolic links and aliases. Aliases are designed to maintain a link to their original file even if they are moved or renamed; they are not interpreted by the file system itself, but by the File Manager code in userland. Mac OS X also supports the UFS file system, derived from the BSD Unix Fast File System via NeXTSTEP. 1.8.3 File systems under Plan 9 from Bell Labs Plan 9 from Bell Labs was originally designed to extend some of Unix's good points, and to introduce some new ideas of its own while fixing the shortcomings of Unix. With respect to file systems, the Unix system of treating things as files was continued, but in Plan 9, everything is treated as a file, and accessed as a file would be (i.e., no ioctl or mmap). Perhaps surprisingly, while the file interface is made universal it is also simplified considerably, for example symlinks, hard links and suid are made obsolete, and an atomic create/open operation is introduced. More importantly the set of file operations becomes well defined and subversions of this like ioctl are eliminated. Secondly, the underlying 9P protocol was used to remove the difference between local and remote files (except for a possible difference in latency). This has the advantage that a device or devices, represented by files, on a remote computer could be used as though it were the local computer's own device(s). This means that under Plan 9, multiple file servers provide access to devices, classing them as file systems. Servers for "synthetic" file systems can also run in user space bringing many of the advantages of micro kernel systems while maintaining the simplicity of the system. 96 Everything on a Plan 9 system has an abstraction as a file; networking, graphics, debugging, authentication, capabilities, encryption, and other services are accessed via I-O operations on file descriptors. For example, this allows the use of the IP stack of a gateway machine without need of NAT, or provides a network-transparent window system without the need of any extra code. Another example: a Plan-9 application receives FTP service by opening an FTP site. The ftpfs server handles the open by essentially mounting the remote FTP site as part of the local file system. With ftpfs as an intermediary, the application can now use the usual filesystem operations to access the FTP site as if it were part of the local file system. A further example is the mail system which uses file servers that synthesize virtual files and directories to represent a user mailbox as /mail/fs/mbox. The wikifs provides a file system interface to a wiki. These file systems are organized with the help of private, per-process namespaces, allowing each process to have a different view of the many file systems that provide resources in a distributed system. 1.8.4 File systems under Microsoft Windows Windows makes use of the FAT and NTFS (New Technology File System) file systems. The FAT (File Allocation Table) filing system, supported by all versions of Microsoft Windows, was an evolution of that used in Microsoft's earlier operating system (MS-DOS which in turn was based on 86-DOS). FAT ultimately traces its roots back to the shortlived M-DOS project and Standalone disk BASIC before it. Over the years various features have been added to it, inspired by similar features found on file systems used by operating systems such as UNIX. Older versions of the FAT file system (FAT12 and FAT16) had file name length limits, a limit on the number of entries in the root directory of the file system and had restrictions on the maximum size of FAT-formatted disks or partitions. Specifically, FAT12 and FAT16 had a limit of 8 characters for the file name, and 3 characters for the extension. This is commonly referred to as the 8.3 filename limit. VFAT, which was an extension to FAT12 and FAT16 introduced in Windows NT 3.5 and subsequently included in Windows 95, allowed long file names (LFN). FAT32 also addressed many of the limits in FAT12 and FAT16, but remains limited compared to NTFS. NTFS, introduced with the Windows NT operating system, allowed ACL-based permission control. Hard links, multiple file streams, attribute indexing, quota tracking, compression and mount-points for other file systems (called "junctions") are also supported, though not all these features are well-documented. Unlike many other operating systems, Windows uses a drive letter abstraction at the user level to distinguish one disk or partition from another. For example, the path C:\WINDOWS represents a directory WINDOWS on the partition represented by the letter C. The C drive is most commonly used for the primary hard disk partition, on which Windows is installed and from which it boots. This "tradition" has become so firmly ingrained that bugs came about in older versions of Windows which made assumptions that the drive that the operating system was installed on was C. The tradition of using "C" for the drive letter can be traced to MS-DOS, where the letters A and B were reserved for up to two floppy disk drives; 97 in a common configuration, A would be the 3½-inch floppy drive, and B the 5¼-inch one. Network drives may also be mapped to drive letters. 13.8 Let us sum up In this lesson we have learnt about a) the various types of file systems b) the multiple file system support c) and examples of file systems 13.9 Points for Discussion a) Define file system b) Define flat file system 13.10 Model Answers to Check your Progress In order to check your progress, answer for the first question is given here In computing, a file system (often also written as filesystem) is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. File systems may use a data storage device such as a hard disk or CD-ROM and involve maintaining the physical location of the files, they might provide access to data on a file server by acting as clients for a network protocol (e.g., NFS, SMB, or 9P clients), or they may be virtual and exist only as an access method for virtual data (e.g., procfs). More formally, a file system is a set of abstract data types that are implemented for the storage, hierarchical organization, manipulation, navigation, access, and retrieval of data. File systems share much in common with database technology, but it is debatable whether a file system can be classified as a special-purpose database (DBMS) 13.11 Lesson - end activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. b) Discuss about various file systems c) Discuss about UNIX and Linux file systems d) Discuss about Microsoft Windows file systems 13.12 References Charles Crowley, Chapter 16, 17 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 H.M. Deitel, Chapter 13 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 4 of “Modern Operating Systems”, PHI, 1996 D.M. Dhamdhere, Chapter 17 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 98 LESSON – 14: FILE ALLOCATION CONTENTS 14.1 Aims and Objectives 14.2 Introduction 14.3 Free Space Management 14.4 Contiguous allocation 14.5 Linked allocation 14.6 Indexed allocation 14.7 Implementation issues 14.7.1 Memory allocation 14.7.2 Fixed Sized Allocation 14.7.3 Variable Size Allocation 14.7.4 Memory Allocation with PAGING 14.7.5 Memory Mapped files 14.7.6 Hardware support 14.7.7 Copy-on-Write 14.8 Let us sum up 14.9 Points for discussion 14.10 Model answer to check your progress 14.11 Lesson - end activities 14.12 References 14.1 Aims and Objectives In this lesson we will learn about the basics of file allocation. The objective of this lesson is to make the student aware of the basic concepts of the following a) Free Space Management, b) Contiguous allocation, Linked allocation, and Indexed allocation c) Implementation issues 99 14.2 Introduction The main discussion here is how to allocate space to files so that disk space is effectively utilized and files can be quickly accessed. Three major methods of allocating disk space are in wide use and they are contiguous allocation, linked allocation and indexed allocation, each one having its own advantages and disadvantages. 14.3 Free Space Management To keep track of the free space, the file system maintains a free space list which records all disk blocks which are free. We search the free space list to create a file for the required amount of space and allocate it to the new file. This space is then removed from the free space list. When a file is deleted, its disk space is added to the free space list. Bit-Vector Frequently, the free-space list is implemented as a bit map or bit vector. Each block is represented by a 1 bit. If the block is free, the bit is 0; if the block is allocated, the bit is 1. For example, consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26, and 27 are free, and the rest of the blocks are allocated. The free-space bit map would be: 11000011000000111001111110001111… The main advantage of this approach is that it is relatively simple and efficient to find n consecutive free blocks on the disk. Unfortunately, bit vectors are inefficient unless the entire vector is kept in memory for most accesses. Keeping it main memory is possible for smaller disks such as on microcomputers, but not for larger ones. Linked List Another approach is to link all the free disk blocks together, keeping a pointer to the first free block. This block contains a pointer to the next free disk block, and so on. In the previous example, a pointer could be kept to block 2, as the first free block. Block 2 would contain a pointer to block 3, which would point to block 4, which would point to block 5, which would point to block 8, and so on. This scheme is not efficient; to traverse the list, each block must be read, which requires substantial I/O time Grouping A modification of the free-list approach is to store the addresses of n free blocks in the first free block. The first n-1 of these are actually free. The last one is the disk address of another block containing addresses of another n free blocks. The importance of this implementation is that addresses of a large number of free blocks can be found quickly. 100 Counting Another approach is to take advantage of the fact that, generally, several contiguous blocks may be allocated or freed simultaneously, particularly when contiguous allocation is used. Thus, rather than keeping a list of free disk addresses, the address of the first free block is kept and the number n of free contiguous blocks that follow the first block. Each entry in the free-space list then consists of a disk address and a count. Although each entry requires more space than would a simple disk address, the overall list will be shorter, as long as the count is generally greater than 1. 14.4 Contiguous allocation The contiguous allocation method requires each file to occupy a set of contiguous address on the disk. Disk addresses define a linear ordering on the disk. Notice that, with this ordering, accessing block b+1 after block b normally requires no head movement. When head movement is needed (from the last sector of one cylinder to the first sector of the next cylinder), it is only one track. Thus, the number of disk seeks required for accessing contiguous allocated files in minimal, as is seek time when a seek is finally needed. Contiguous allocation of a file is defined by the disk address and the length of the first block. If the file is n blocks long, and starts at location b, then it occupies blocks b, b+1, b+2, …, b+n-1. The directory entry for each file indicates the address of the starting block and the length of the area allocated for this file. The difficulty with contiguous allocation is finding space for a new file. If the file to be created is n blocks long, then the OS must search for n free contiguous blocks. First-fit, bestfit, and worst-fit strategies (as discussed in Chapter 4 on multiple partition allocation) are the most common strategies used to select a free hole from the set of available holes. Simulations have shown that both first-fit and best-fit are better than worst-fit in terms of both time storage utilization. Neither first-fit nor best-fit is clearly best in terms of storage utilization, but first-fit is generally faster. These algorithms also suffer from external fragmentation. As files are allocated and deleted, the free disk space is broken into little pieces. External fragmentation exists when enough total disk space exists to satisfy a request, but this space not contiguous; storage is fragmented into a large number of small holes. Another problem with contiguous allocation is determining how much disk space is needed for a file. When the file is created, the total amount of space it will need must be known and allocated. How does the creator (program or person) know the size of the file to be created. In some cases, this determination may be fairly simple (e.g. copying an existing file), but in general the size of an output file may be difficult to estimate. 14.5 Linked allocation The problems in contiguous allocation can be traced directly to the requirement that the spaces be allocated contiguously and that the files that need these spaces are of different sizes. These requirements can be avoided by using linked allocation. 101 In linked allocation, each file is a linked list of disk blocks. The directory contains a pointer to the first and (optionally the last) block of the file. For example, a file of 5 blocks which starts at block 4, might continue at block 7, then block 16, block 10, and finally block 27. Each block contains a pointer to the next block and the last block contains a NIL pointer. The value -1 may be used for NIL to differentiate it from block 0. With linked allocation, each directory entry has a pointer to the first disk block of the file. This pointer is initialized to nil (the end-of-list pointer value) to signify an empty file. A write to a file removes the first free block and writes to that block. This new block is then linked to the end of the file. To read a file, the pointers are just followed from block to block. There is no external fragmentation with linked allocation. Any free block can be used to satisfy a request. Notice also that there is no need to declare the size of a file when that file is created. A file can continue to grow as long as there are free blocks. Linked allocation, does have disadvantages, however. The major problem is that it is inefficient to support directaccess; it is effective only for sequential-access files. To find the ith block of a file, it must start at the beginning of that file and follow the pointers until the ith block is reached. Note that each access to a pointer requires a disk read. Another severe problem is reliability. A bug in OS or disk hardware failure might result in pointers being lost and damaged. The effect of which could be picking up a wrong pointer and linking it to a free block or into another file. 14.6 Indexed allocation The indexed allocation method is the solution to the problem of both contiguous and linked allocation. This is done by bringing all the pointers together into one location called the index block. Of course, the index block will occupy some space and thus could be considered as an overhead of the method. In indexed allocation, each file has its own index block, which is an array of disk sector of addresses. The ith entry in the index block points to the ith sector of the file. The directory contains the address of the index block of a file. To read the ith sector of the file, the pointer in the ith index block entry is read to find the desired sector. Indexed allocation supports direct access, without suffering from external fragmentation. Any free block anywhere on the disk may satisfy a request for more space. 14.7 Implementation issues 14.7.1 Memory allocation Dynamic memory allocation involves 2 central commands: malloc allocates a portion of unused memory for use by a process, while free frees a previously-allocated portion of memory, allowing it to be reused. The operating system must allocate memory among all running processes, but processes themselves must also allocate memory at a finer granularity. Many of the issues are the same in each case. So how can we implement these commands efficiently? First, let's consider an extremely simple algorithm: fixed size allocation. 102 14.7.2 Fixed Sized Allocation Assume N = 32 bytes: everything allocated is exactly 32 bytes long. If we are given a 32 MB Heap like this: Fig. 1: 32MB Heap 1 0 0 1 … 0 0 0 Fig 2: 1MB Free bitmap This 32 MB Heap is divided into 32 byte chunks. In order to determine which chunks are free we need to do a little bookkeeping. Since there 1 MB of chunks, you will need 1 MB free bitmap to do the bookkeeping. Each byte would represent a chunk in the heap. If that byte is 1, then the chunk is being used, if the byte is 0, then the corresponding chunk is free for use. With this bitmap, the algorithm for allocating a chunk would be: Allocate Search through bitmap Look for free location Turn bit to 1 Return Chunk Keep in mind this whole sequence should be synchronized. There lies the problem of two processes attempting to allocate the same chunk at the same time. This would cause a very dangerous race condition. To free a memory location the algorithm would be: Free(chunk #i) bitmap[i/8] &= (1 << (i % 8)); Also note that without synchronization, if two threads free two chunks in the same byte at the same time, one chunk might not look free. There are both positives and negatives to this design. The positive is that it uses a single local data structure. However, this positive is more useful for disks than memory. Memory is much faster than disk in regards to changing to different chunks. The negative is much more glaring in this design. It is an O(n) algorithm for allocation! This is far too inefficient to work as memory allocation. As a result, we should look for other solutions. One proposed solution would be to use 2 pointers: one pointer at the end of the heap and one free pointer that points to the most recently freed chunk. It would look like this: 103 Fig 3: Free and End pointer implementation The allocation of memory would go like this: Allocate If free < end Return free++; This would have an allocation algorithm of O(1). However there is a serious problem with this design: How would we ever free a chunk that was not the most recently allocated? So what's a better solution? Make a free list: a linked list of free chunks. The head of the free list points to the first free chunk, and each free chunk points to the next free chunk. The design would be: Allocate If(free != NULL) chunk = free free = free-> next; return chunk Free p->next = free free = p This uses the concept of using free space to maintain bookkeeping state. The more free space we have, the more free pointers we need. Conversely, the less free space we have, the less of a need for free pointers. As a result, this design also results in optimal use of space. We can allocate all but one of the chunks (namely, the chunk containing the free pointer itself). And, since all allocations return exactly one chunk, and chunks are 100% independent of one another, the heap is nearly 100% utilizable. But a malloc() that only works for N=32 is not much of a malloc at all! What happens when we do not have fixed sized allocation? Let us look at cases when N is an arbitrary number and the heap is still 32 MB. This is called Variable Size Allocation. 14.7.3 Variable Size Allocation In variable size allocation, unlike fixed-size allocation, we need to keep track of some bookkeeping information for allocated chunks as well as for free chunks. In particular, the free function must know the size of the chunk it's freeing! One common way to do this is to 104 store bookkeeping information immediately before the pointer returned by malloc. For example: typedef struct malloc_chunk { int sentinel; struct malloc_bookkeeping *prev; struct malloc_bookkeeping *next; char actual_data[]; /* pointer returned from malloc() points here */ } malloc_chunk_t; Then, free can just use simple pointer arithmetic to find the bookkeeping information. A simple heap using this bookkeeping structure might look like this. Fig 4: Implementation of variable sized allocation. The first component of each chunk is called the "sentinel". It shows whether the chunk is allocated (A) or not (F). The second component has a pointer to the previous chunk, and the third has a pointer to the next chunk. (Notice that those pointers point at the chunk data, not at the initial sentinel!) An "X" the second or third column indicates that there is no previous/next chunk. The fourth component is the chunk data; we write the size of the data there. The first three columns are 4 bytes and the fourth column is the size of allocation. The list of chunks is kept in sorted order by address; the size of any chunk can be calculated through pointer arithmetic using the next field. As before, free chunks are additionally kept on their own free list, using the data areas; we don't show this list in the diagrams. What happens when alloc(3000) is called to an empty heap? Fig 5: The heap after a call to alloc(3000). What happens when we call alloc(1MB)? We will just look at the latter half of the heap from now on. 105 Fig 6: The heap after a call to alloc(1MB). What happens when we call alloc(10MB)? Fig 7: The heap after a call to alloc(10MB). What happens when we call free(the 1MB chunk)? Fig 8: The heap after a call to free(the 1MB chunk). Now, what happens when we call alloc(22MB)? The call will fail. Although 22MB are available, the free space is split into two chunks, a 21MB chunk and a 1MB chunk. The chunks are noncontiguous and there's no way to move them together. The heap is fragmented, since it is unable to allocate 22MB even if it has 22MB of space. This type of fragmentation is called external fragmentation. (There can also be internal fragmentation if malloc() returns chunks that are bigger than the user requested. For example, malloc(3) might commonly return a chunk with 8 bytes of data area, rather than 3; the 5 remaining bytes are lost to internal fragmentation.) Free space is divided into noncontiguous chunks. Allocation can fail even though there is enough free space. How can we solve external fragmentation? We could use compaction to solve this problem. The heap is copied, but the free space is compacted to the right side. This allows calls like alloc(22MB) now! But compaction is extremely expensive. It requires copying a 106 ton of data, and it also requires help from the programming language so that pointer variables can be updated correctly. Compaction is not used on OSes for modern architectures. Compared to memory allocation with constraints N = 32 bytes, variable size allocation has more overheads, can run into external fragmentation, and internal fragmentation. To avoid these issues, we turn to memory allocation with paging. This lets the operating system use fixed-size allocation, with all its benefits, to provide applications with variable-size allocation! 14.7.4 Memory Allocation with PAGING 0 1GB-1 Physical Addresses 4KB 0 1MB 2MB 4GB (232 ) Virtual Addresses Fig 9: Virtual Address Space Implementation Virtual Address Spaces: 1. Reduces fragmentation by separating contiguous virtual pages from contiguity in physical address space. 2. Provides for isolation between processes. Each process has its own space. %cr3 Page Directory Page Tables Physical Addresses Fig 10: Paging Paging allows the OS to avoid external fragmentation. Variable sized allocation built from fixed size allocation + hardware supported address indirection. Page Faults occur when you try to access an invalid address. This can happen on execute, read, and write commands. The CPU traps on invalid accesses. 14.7.5 Memory Mapped files When attempting a sequential read on a file on disk, we need to use system calls such as open(), read(), and write(). This can be quite costly if the file is rather large. Also, sharing becomes quite the pain. One alternative is memory mapped files. Memory mapping is when a process marks a portion of its memory as corresponding to some file. For example, suppose you wanted to read and write to /tmp/foo. This can be accomplished by memory mapping: 107 /tmp/foo -> memory address 0x10000 So when someone accesses memory address 0x10000, they are now accessing the start of the file foo. Because of this, the function that invokes memory mapping I/O would need to return a pointer to the start of the file. This function is called: void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off); The mmap function will return a pointer to the starting address, also known as the start of the file. In the *addr parameter, it allows the user to determine what address it would like to begin at. It is often best to place NULL and allow the operating system to decide. If the user were to place an address that the operating system does not like, an error will occur. The len parameter is merely the length in which the user wants to map the file. The prot parameter involves protection mechanism the user would like to enforce. The user can add such flags as PROT_READ, PROT_WRITE, and PROT_EXEC as permissions for the new memory mapped file. The parameter flags are any miscellaneous flags that the user wishes to set. For example, to allow sharing, the user can set the MAP_SHARED flag. The parameter fildes is the file descriptor that the opened file is on. Finally, the off parameter is the offset in the file that you want to start mapping from. An example of an invocation of this function would be: addr = mmap(NULL, length, PROT_READ, 0, fd, offset); Suppose that the we invoke the mmap function with a length of 1 MB and file descriptor 2. It would cause the following effect: Physical Address Space: fd fd 3MB Virtual Address fd 4MB fd Fig 11: Fault on 0x308002, loads from physical address space. What occurs now is that the files are read in the background into physical memory space. When the process faults: OS checks if the addr is in mmapped space If it is, then it will load that page of data from disk (unless already cached) Add virtual memory mapping for that page Some advantages/positives to memory mapping include no need for copying when not needed, and the ability to share a file amongst processes. If processes read the same file, they can share the same physical address. However, there are also some disadvantages to memory mapped I/O. A simple problem is that data must be page-aligned; user programs must be careful to ensure this. A more complex problem is the need to synchronize writes with the disk. Once a file is memory mapped, will the OS have to write out the entire file back to disk, even if only one byte was written? Another complex problem is unwanted sharing. Say a process P1 is reading a file, and another process P2 is writing to the same file. 108 Before memory-mapped I/O, P1 could get a stable version of the file by reading its data into memory; no one could change P1's memory copy of the file data. But with memory-mapped I/O, it's possible that P1 would see all of P2's changes to the file, as P2 made those changes! 14.7.6 Hardware support After talks between software and hardware manufacturers, the hardware manufacturers were able to improve the hardware to help support memory management. The result was a Page Table Entry like the following below: 20 bits 12 bits of flags Physical Address A U D Dirty bit Access Bit User/ Supervisor W P Present Write/Read Permission Fig 12: Page Entry Table The new accessed bit and dirty bit are set by the hardware. The CPU sets the access bit to 1 when a portion of memory has been read or written, and sets the dirty bit to 1 when the portion of memory has been written to. The hardware never clears these bits; the operating system is expected to clear them as necessary. This allows for more efficient writes to mmapped files. The operating system only needs to write pages of mmapped files when they have the dirty bit set to 1. All other pages have not been altered and consequently do not need to be written back to disk. This reduces bus traffic as well as overhead. Once a read has occurred, the file is changed and the OS clears the dirty bits on the pages of the mmapped file. 3MB VA1 PA 4MB x fd fd fd Fig 13: Writing to a mmapped file. Trick: When synchronizing files with the disk, the operating system clears the dirty bit on all pages of the memory mapped file when they are first read in. Note: Only write pages with D = 1! (Dirty bit set to 1). This reduces disk traffic for writes. 14.7.7 Copy-on-Write 109 VA1 VA2 fd PA fd fd fd Fig 14: Copy-on-Write To fix unwanted sharing, a copy-on-write algorithm can be applied. This algorithm allows sharing of pages between two processes until one process decides to write to it. When one process tries to write on a page, the page is copied so that each process has its own version of the page, as depicted above. To copy-on-write, the operating system marks the pages in bookkeeping structure as copy-on-write and the virtual address mapping is set to NON-WRITABLE. When a process tries to write to the page, it faults! When the operating system catches a fault, it copies the page, and changes the virtual address mapping to the new copied page. It sets the virtual address mapping to WRITABLE. After this event, each process has its own copy of the page and is able to write to its own copies. 14.8 Let us sum up In this lesson we have learnt about a) the memory allocation b) and the allocation and freeing space 14.9 Points for discussion a) Discuss about linked allocation b) Discuss about indexed allocation 14.10 Model answer to Check your Progress In order to check the progress of the student, answer for second question in section 14.10 is given here. The indexed allocation method is the solution to the problem of both contiguous and linked allocation. This is done by bringing all the pointers together into one location called the index block. Of course, the index block will occupy some space and thus could be considered as an overhead of the method. In indexed allocation, each file has its own index block, which is an array of disk sector of addresses. The ith entry in the index block points to the ith sector of the file. The directory contains the address of the index block of a file. To read the ith sector of the file, the pointer in the ith index block entry is read to find the desired sector. Indexed allocation supports direct access, without suffering from external fragmentation. Any free block anywhere on the disk may satisfy a request for more space. 110 14.11 Lesson - end activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. b) Discuss about Memory allocation c) Discuss about free space management 14.12 References Charles Crowley, Chapter 16, 17 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 H.M. Deitel, Chapter 13 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 4 of “Modern Operating Systems”, PHI, 1996 D.M. Dhamdhere, Chapter 17 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 111 LESSON – 15: FILE DESCRIPTORS AND ACCESS CONTROL CONTENTS 15.1 Aims and Objectives 15.2 Introduction 15.3 File descriptor in programming 15.4 Operations on file descriptors 15.4.1 Creating file descriptors 15.4.2 Deriving file descriptors 15.4.3 Operations on a single file descriptor 15.4.4 Operations on multiple file descriptors 15.4.5 Operations on the file descriptor table 15.4.6 Operations that modify process state 15.4.7 File locking 15.4.8 Sockets 15.4.9 Miscellaneous 15.4.10 Upcoming Operations 15.5 Access Control Matrix 15.6 Let us sum up 15.7 Points for Discussion 15.8 Model Answers to Check your Progress 15.9 Lesson - end activities 15.10 References 15.1 Aims and Objectives In this lesson we will learn about the file descriptors and access control. The objectives of this lesson is to make the candidate aware of the following a) file descriptors b) operations on file descriptor a. creating b. deriving c. modifying, etc. c) access control matrix 15.2 Introduction 112 A file descriptor or file control block is a control block containing information the system needs to manage a file. The file descriptor is controlled by the operating system and is brought to the primary storage when a file is opened. A file descriptor contains information regarding (i) symbolic file name, (ii) location of file, (iii) file organization (sequential, indexed, etc.), (iv) device type, (v) access control data, (vi) type (data file, object program, C source program, etc.), (vii) disposition (temporary or permanent), (viii) date and time of creation, (ix) destroy date, (x) last modified date and time, (xi) access activity counts (number of reads, etc.). 15.3 File descriptor in programming In computer programming, a file descriptor is an abstract key for accessing a file. The term is generally used in POSIX operating systems. In Microsoft Windows terminology and in the context of the C standard I/O library, "file handle" is preferred, though the latter case is technically a different object (see below). In POSIX, a file descriptor is an integer, specifically of the C type int. There are 3 standard POSIX file descriptors which presumably every process (save perhaps a daemon) should expect to have: Integer value Name 0 Standard Input (stdin) 1 Standard Output (stdout) 2 Standard Error (stderr) Generally, a file descriptor is an index for an entry in a kernel-resident data structure containing the details of all open files. In POSIX this data structure is called a file descriptor table, and each process has its own file descriptor table. The user application passes the abstract key to the kernel through a system call, and the kernel will access the file on behalf of the application, based on the key. The application itself cannot read or write the file descriptor table directly. In Unix-like systems, file descriptors can refer to files, directories, block or character devices (also called "special files"), sockets, FIFOs (also called named pipes), or unnamed pipes. The FILE * file handle in the C standard I/O library routines is technically a pointer to a data structure managed by those library routines; one of those structures usually includes an actual low level file descriptor for the object in question on Unix-like systems. Since file handle refers to this additional layer, it is not interchangeable with file descriptor. To further complicate terminology, Microsoft Windows also uses the term file handle to refer to the more low-level construct, akin to POSIX's file descriptors. Microsoft's C libraries also provide compatibility functions which "wrap" these native handles to support the POSIX-like convention of integer file descriptors as detailed above. A program is passed a set of ``open file descriptors'', that is, pre-opened files. A setuid/setgid program must deal with the fact that the user gets to select what files are open 113 and to what (within their permission limits). A setuid/setgid program must not assume that opening a new file will always open into a fixed file descriptor id, or that the open will succeed at all. It must also not assume that standard input (stdin), standard output (stdout), and standard error (stderr) refer to a terminal or are even open. The rationale behind this is easy; since an attacker can open or close a file descriptor before starting the program, the attacker could create an unexpected situation. If the attacker closes the standard output, when the program opens the next file it will be opened as though it were standard output, and then it will send all standard output to that file as well. Some C libraries will automatically open stdin, stdout, and stderr if they aren't already open (to /dev/null), but this isn't true on all Unix-like systems. Also, these libraries can't be completely depended on; for example, on some systems it's possible to create a race condition that causes this automatic opening to fail (and still run the program). When using ILE C/400 stream I/O functions as defined by the American National Standards Institute (ANSI) to perform operations on a file, you identify the file through use of pointers. When using the integrated file system C functions, you identify the file by specifying a file descriptor. A file descriptor is a positive integer that must be unique in each job. The job uses a file descriptor to identify an open file when performing operations on the file. The file descriptor is represented by the variable fildes in C functions that operate on the integrated file system and by the variable descriptor in C functions that operate on sockets. Each file descriptor refers to an open file description, which contains information such as a file offset, status of the file, and access modes for the file. The same open file description can be referred to by more than one file descriptor, but a file descriptor can refer to only one open file description. Figure 1. File descriptor and open file description If an ILE C/400 stream I/O function is used with the integrated file system, the ILE C/400 run-time support converts the file pointer to a file descriptor. When using the "root" (/), QOpenSys, or user-defined file systems, you can pass access to an open file description from one job to another, thus allowing the job to access the file. You do this by using the givedescriptor() or takedescriptor() function to pass the file descriptor between jobs. 114 15.4 Operations on file descriptors A modern Unix typically provides the following operations on file descriptors. 15.4.1 Creating file descriptors open(), open64(), creat(), creat64() socket() socketpair() pipe() 15.4.2 Deriving file descriptors fileno() dirfd() 15.4.3 Operations on a single file descriptor read(), write() recv(), send() recvmsg(), sendmsg() (inc. allowing sending FDs) sendfile() lseek(), lseek64() fstat(), fstat64() fchmod() fchown() fdopen() gzdopen() ftruncate() 15.4.4 Operations on multiple file descriptors select(), pselect() poll(), epoll() 15.4.5 Operations on the file descriptor table close() dup() dup2() fcntl (F_DUPFD) fcntl (F_GETFD and F_SETFD) 15.4.6 Operations that modify process state fchdir(): sets the process's current working directory based on a directory file descriptor mmap(): maps ranges of a file into the process's address space 115 15.4.7 File locking flock() fcntl (F_GETLK, F_SETLK and F_SETLKW) lockf() 15.4.8 Sockets connect() bind() listen() accept(): creates a new file descriptor for an incoming connection getsockname() getpeername() getsockopt(), setsockopt() shutdown(): shuts down one or both halves of a full duplex connection 116 15.4.9 Miscellaneous ioctl(): a large collection of miscellaneous operations on a single file descriptor, often associated with a device 15.4.10 Upcoming Operations A series of new operations on file descriptors has been added to Solaris and Linux, as well as numerous C libraries, to be standardized in a future version of POSIX. The at suffix signifies that the function takes an additional first argument supplying a file descriptor from which relative paths are resolved, the forms lacking the at suffix thus becoming equivalent to passing a file descriptor corresponding to the current working directory. openat() faccessat() fchmodat() fchownat() fstatat() futimesat() linkat() mkdirat() mknodat() readlinkat() renameat() symlinkat() unlinkat() mkfifoat() 15.5 Access Control Matrix Access Control Matrix or Access Matrix is an abstract, formal computer protection and security model used in computer systems, that characterizes the rights of each subject with respect to every object in the system. It was first introduced by Butler W. Lampson in 1971. It is the most general description of operating system protection mechanism. According to the model a computer system consists of a set of objects O, that is the set of entities that needs to be protected (e.g. processes, files, memory pages) and a set of subjects S, that consists of all active entities (e.g. users, processes). Further there exists a set of rights R of the form r(s,o). A right thereby specifies the kind of access a subject is allowed to process with regard to an object. 117 Example In this matrix example there exists two processes, a file and a device. The first process has the ability to execute the second, read the file and write some information to the device, while the second process can only send information to the first. Asset 1 Asset 2 Role 1 read, write, execute, own execute Role 2 Read file device read write read, write, execute, own Utility Because it does not define the granularity of protection mechanisms, the Access Control Matrix can be used as a model of the static access permissions in any type of access control system. It does not model the rules by which permissions can change in any particular system, and therefore only gives an incomplete description of the system's access control security policy. An Access Control Matrix should be thought of only as an abstract model of permissions at a given point in time; a literal implementation of it as a two-dimensional array would have excessive memory requirements. Capability-based security and access control lists are categories of concrete access control mechanisms whose static permissions can be modeled using Access Control Matrices. Although these two mechanisms have sometimes been presented (for example in Butler Lampson's Protection paper) as simply row-based and column-based implementations of the Access Control Matrix, this view has been criticized as drawing a misleading equivalence between systems that does not take into account dynamic behaviour. 15.6 Let us sum up In this lesson we have learnt about a) the file descriptors b) and the access control matrix 15.7 Points for Discussion a) Define file descriptor b) Explain about the implementation issues of file descriptor 118 15.8 Model Answers to Check your Progress In order to check the progress of the candidate the answer for the first question in the section 15.8 is given below. A file descriptor or file control block is a control block containing information the system needs to manage a file. The file descriptor is controlled by the operating system and is brought to the primary storage when a file is opened. A file descriptor contains information regarding (i) symbolic file name, (ii) location of file, (iii) file organization (sequential, indexed, etc.), (iv) device type, (v) access control data, (vi) type (data file, object program, C source program, etc.), (vii) disposition (temporary or permanent), (viii) date and time of creation, (ix) destroy date, (x) last modified date and time, (xi) access activity counts (number of reads, etc.). 15.9 Lesson - end activities After learning this lesson, try to discuss among your friends and answer these questions to check your progress. b) Discuss about file operations c) Discuss about access control matrix 15.10 References a) Charles Crowley, Chapter 16, 17 of “Operating Systems – A Design-Oriented Approach”, Tata McGraw-Hill, 2001 b) H.M. Deitel, Chapter 13 of “Operating Systems”, Second Edition, Pearson Education, 2001 c) Andrew S. Tanenbaum, Chapter 4 of “Modern Operating Systems”, PHI, 1996 d) D.M. Dhamdhere, Chapter 17 of “Systems Programming and Operating Systems”, Tata McGraw-Hill, 1997 119 UNIT – V LESSON – 16: MS-DOS CONTENTS 16.1 Aims and Objectives 16.2 Introduction 16.3 Accessing hardware under DOS 16.4 Early History of MS-DOS 16.5 User’s view of MS-DOS 16.5.1 16.5.2 16.5.3 16.5.4 16.5.5 16.5.6 16.5.7 16.5.8 16.5.9 16.5.10 File Management Change Directory Make Directory Copy Delete and Undelete Rename Parent directory F3 Breaking Wildcards (*) and (?) 16.6 Executing, Viewing, Editing, Printing 16.7 Backup Files 16.8 Other commands 16.8.1 16.8.2 16.8.3 16.8.4 16.8.5 16.8.6 16.8.7 16.8.8 16.8.9 Change the Default Drive Change Directory Command DIR (Directory) Command ERASE Command FORMAT Command Rebooting the computer (Ctrl-Alt-Del) RENAME (REN) Command RMDIR (RD) Remove Directory Command Stop Execution (Ctrl-Break) 16.9 The system view of MS-DOS 16.10 The future of MS-DOS 16.11 Let us sum up 16.12 Points for Discussion 16.13 Model Answers to Check your Progress 16.14 Lesson - end activities 16.15 References 120 16.1 Aims and Objectives In this lesson we will learn about the introduction to MS-DOS and its commands. The objectives of this lesson is to make the candidate aware of the following d) History of MS-DOS e) MS-DOS Commands f) Users view of MS-DOS g) System View of MS-DOS h) And the future of MS-DOS 16.2 Introduction A full time operating system was needed for IBM's 8086 line of computers, but negotiations for the use of CP/M on these broke down. IBM approached Microsoft's CEO, Bill Gates, who purchased QDOS from SCP allegedly for $50,000. This became Microsoft Disk Operating System, MS-DOS. Microsoft also licensed their system to multiple computer companies, who used their own names. Eventually, Microsoft would require the use of the MS -DOS name, with the exception of the IBM variant, which would continue to be developed concurrently and sold as PC-DOS (this was for IBM's new 'PC' using the 8088 CPU (internally the same as the 8086)). Early versions of Microsoft Windows were little more than a graphical shell for DOS, and later versions of Windows were tightly integrated with MS-DOS. It is also possible to run DOS programs under OS/2 and Linux using virtual-machine emulators. Because of the long existence and ubiquity of DOS in the world of the PC-compatible platform, DOS was often considered to be the native operating system of the PC compatible platform. There are alternative versions of DOS, such as FreeDOS and OpenDOS. FreeDOS appeared in 1994 due to Microsoft Windows 95, which differed from Windows 3.11 by being not a shell and dispensing with MS-DOS. MS-DOS (and the IBM PC-DOS which was licensed therefrom), and its predecessor, 86-DOS, was inspired by CP/M (Control Program / (for) Microcomputers) - which was the dominant disk operating system for 8-bit Intel 8080 and Zilog Z80 based microcomputers. Tim Paterson at Seattle Computer Products developed a variant of CP/M-80, intended as an internal product for testing the SCP's new 8086 CPU card for the S-100 bus. It did not run on the 8080 CPU needed for CP/M-80. The system was named 86-DOS (it had initially been called QDOS, which stood for Quick and Dirty Operating System). Digital Research would attempt to regain the market with DR-DOS, an MS-DOS and CP/M hybrid. Digital Research would later be bought out by Novell, and DR DOS became Novell DOS 7. DR DOS would later be part of Caldera (as OpenDOS), Lineo (as DR DOS), and DeviceLogics. Early versions of Microsoft Windows were shell programs that ran in DOS. Windows 3.11 extended the shell by going into protected mode and added 32-bit support. These were 16-bit/32-bit hybrids. Microsoft Windows 95 further reduced DOS to the role of the boot loader. Windows 98 and Windows Me were the last Microsoft OS to run on DOS. The DOS121 based branch was eventually abandoned in favor of Windows NT, the first true 32-bit system that was the foundation for Windows XP and Windows Vista. Windows NT, initially NT OS/2 3.0, was the result of collaboration between Microsoft and IBM to develop a 32-bit operating system that had high hardware and software portability. Because of the success of Windows 3.0, Microsoft changed the application programming interface to the extended Windows API, which caused a split between the two companies and a branch in the operating system. IBM would continue to work on OS/2 and OS/2 API, while Microsoft renamed its operating system Windows NT. 16.3 Accessing hardware under DOS The operating system offers a hardware abstraction layer that allows development of character-based applications, but not for accessing most of the hardware, such as graphics cards, printers, or mice. This required programmers to access the hardware directly, resulting in each application having its own set of device drivers for each hardware peripheral. Hardware manufacturers would release specifications to ensure device drivers for popular applications were available. 16.4 Early History of MS-DOS Microsoft bought non-exclusive rights for marketing 86-DOS in October 1980. In July 1981, Microsoft bought exclusive rights for 86-DOS (by now up to version 1.14) and renamed the operating system MS-DOS. The first IBM branded version, PC-DOS 1.0, was released in August, 1981. It supported up to 640 kB of RAM[3] and four 160 kB 5.25" single sided floppy disks. Various versions of DOS: Version - 1.1 In May 1982, PC-DOS 1.1 added support for 320 kB double-sided floppy disks. Version – 2.0 PC-DOS 2.0 and MS-DOS 2.0, released in March 1983, were the first versions to support the PC/XT and fixed disk drives (commonly referred to as hard disk drives). Floppy disk capacity was increased to 180 kB (single sided) and 360 kB (double sided) by using nine sectors per track instead of eight. At the same time, Microsoft announced its intention to create a GUI for DOS. Its first version, Windows 1.0, was announced on November 1983, but was unfinished and did not interest IBM. By November 1985, the first finished version, Microsoft Windows 1.01, was released. Version – 3.0 MS-DOS 3.0, released in September 1984, first supported 1.2Mb floppy disks and 32Mb hard disks. 122 Version - 3.1 MS-DOS 3.1 released November that year, introduced network support. Version - 3.2 MS-DOS 3.2, released in April 1986, was the first retail release of MS-DOS. It added support of 720 kB 3.5" floppy disks. Previous versions had been sold only to computer manufacturers who pre-loaded them on their computers, because operating systems were considered part of a computer, not an independent product. Version - 3.3 MS-DOS 3.3, released in April 1987, featured logical disks. A physical disk could be divided into several partitions, considered as independent disks by the operating system. Support was also added for 1.44 MB 3.5" floppy disks. The first version of DR DOS was released in May of 1988, and was compatible with MS/PC-DOS 3.3. Later versions of DR DOS would continue to identify as "DOS 3.31." to applications, despite using newer version numbers. Version - 4.0 MS-DOS 4.0, released in July 1988, supported disks up to 2 GB (disk sizes were typically 40-60 MB in 1988), and added a full-screen shell called DOSSHELL. Other shells, like Norton Commander and PCShell, already existed in the market. In November of 1988, Microsoft addressed many bugs in a service release, MS-DOS 4.01. DR DOS skipped version 4 due to perceived unpopularity of MS-DOS 4.x. Wishing to get a jump on Microsoft, Digital Research released DR DOS 5 in May 1990, which included much more powerful utilities that previous DOS versions. Version - 5.0 MS-DOS 5.0 was released in April 1991, mainly as a follow-up to DR DOS 5. It included the full-screen BASIC interpreter QBasic, which also provided a full-screen text editor (previously, MS-DOS had only a line-based text editor, edlin). A disk cache utility SmartDrive, undelete capabilities, and other improvements were also included. It had severe problems with some disk utilities, fixed later in MS-DOS 5.01, released later in the same year. Version - 6.0 MS-DOS 6.0 was released in March 1993. Following competition from Digital Research's SuperStor, Microsoft added a disk compression utility called DoubleSpace. At the time, typical hard disk sizes were about 200-400 MB, and many users badly needed more disk space. It turned out that DoubleSpace contained stolen code from another compression utility, Stacker, which led to later legal problems. MS-DOS 6.0 also featured the disk 123 defragmenter DEFRAG, backup program MSBACKUP, memory optimization with MEMMAKER, and rudimentary virus protection via MSAV. Version - 6.2 As with versions 4.0 and 5.0, MS-DOS 6.0 turned out to be buggy. Due to complaints about loss of data, Microsoft released an updated version, MS-DOS 6.2, with an improved DoubleSpace utility, a new disk check utility, SCANDISK (similar to fsck from Unix), and other improvements. December 1993 saw the release of Novell DOS 7, which was DR DOS under a new name. Its multiple bugs, as well as DR DOS' already declining market share and Windows 95 looming on the horizon, led to low sales. By this time, PC DOS was at version 6.1, and IBM split its development from Microsoft. From this point, the two developed independently. Version - 6.21 The next version of MS-DOS, 6.21 (released March 1994), appeared due to legal problems. Stac Electronics sued Microsoft due to stolen source code from their utility, Stacker, and forced them to remove DoubleSpace from their operating system. Version - 6.22 In May 1994, Microsoft released MS-DOS 6.22, with another disk compression package, DriveSpace, licensed from VertiSoft Systems. MS-DOS 6.22 was the last standalone version of MS-DOS available to the general public. MS-DOS was removed from marketing by Microsoft on November 30, 2001. Version - 6.23 Microsoft also released versions 6.23 to 6.25 for banks and American military organizations. These versions introduced FAT32 support. Version - 7.0 Microsoft Windows 95 incorporated MS-DOS version 7.0, but only as the kernel (as Windows became the full operating system). Windows 98 also used MS-DOS 7. At this point, Microsoft announced abandonment of the DOS kernel and released Windows 2000 on the NT kernel, but following its commercial failure, released one more DOS kernel Windows, Windows ME. The next system, Windows XP, was based on the NT kernel. Windows ME used MS-DOS 8; Windows XP and Vista continue to use MS-DOS 8 on emergency startup disks. IBM released PC-DOS 7.0 in early 1995. It incorporated many new utilities such as anti-virus software, comprehensive backup programs, PCMCIA support, and DOS Pen extensions. Also added were new features to enhance available memory and disk space. The last version of PC DOS was PC DOS 2000, released in 1998. Its major feature was Y2K compatibility. 124 16.5 User’s view of MS-DOS The user’s view of MS-DOS deals with ‘what the user can accomplish with the MSDOS commands. This section deals with some of the important MS-DOS commands and other commands can be obtained from MS-DOS manual. In MS-DOS, the Command processor interprets user commands. The Command processor is stored in the file COMMAND.COM, which is loaded automatically when MSDOS is started. Internal commands are part of this file. External commands are brought into memory from disk as and when needed. The command processor also executes program files. Executable files have one of the extensions .COM, .EXE, and .BAT. MS-DOS prompts the user to enter commands. The standard prompt is the letter of the current drive followed by the greater than sign. 16.5.1 File Management Before we get started with perhaps the most fundamental aspect of the operating system, file management, let's make sure we are at the root directory, a sort of home base. Many commands in the DOS environment follow a certain format. In order to tell DOS to perform a function, at the command prompt, you would type the command followed by arguments which specify what you want DOS to do. For example: C:\>copy practice.txt a: "COPY" is the command that you want DOS to perform. "PRACTICE.TXT A: " is an example of an argument which specifies what will be affected by the command. In this case, DOS will copy the file practice.txt from the C: drive to the A: drive. Commands such as edit, del, rename, and many other commands require arguments similar to the example listed above. 16.5.2 Change Directory To move to a different directory, use the command "cd" (for "change directory") followed by the directory you wish to move to. The backslash by itself always represents the root directory. C:\>cd \ Now let's create a file. At this point it is important to mention that DOS requires that filenames be no longer than eight characters with the option of a three character extension for descriptive purposes. Also, no spaces or slashes are acceptable. Again, some of this harkens back to the limitations of earlier days. Since it is going to be a simple text file, let's call our file "practice.txt". C:\> edit practice.txt Type your name or some other message. 125 Then press the Alt key, followed by the f key to display the File menu.(You can also use the mouse.) Press [ALT] + [F] Then press the s key to save the file. Press [S] To exit, press the Alt key, followed by the f key followed by the x key. Press [ALT] + [F] + [X] If we take a look at the root directory, the PRACTICE.TXT file will be included. DOS sees the new file as C:\PRACTICE.TXT. 16.5.3 Make Directory Now, let's make a couple of directories so we can put our file away in a place that makes sense to us. The command for making a directory is "md" (for "make directory") and is followed by the name of the directory you wish to make. C:\> md dosclass (to make the directory DOSCLASS) C:\> dir (to view the directory with the new DOSCLASS subdirectory) C:\> cd dosclass (to change to the DOSCLASS directory) C:\DOSCLASS> md samples (to make the directory "samples" inside the DOSCLASS directory) C:\DOSCLASS> dir (to view the directory with the new SAMPLES subdirectory) Note that as soon as we changed our directory, (cd dosclass) the prompt changed to represent the new directory. Remember, if you want to get your bearings, you can take a look at the command prompt or display a directory list of the current directory (dir). 16..5.4 Copy Now that we have created the directories, we can put the practice file we created into the new directory, SAMPLES (C:\DOSCLASS\SAMPLES). To keep things simple, let's "cd" back to the root directory where the practice file is. C:\>cd \ And now let's copy that file to the SAMPLES directory which is inside the DOSCLASS directory. In order to copy something, you must first issue the command (copy), then identify the file to be copied (source), and then the directory to which you wish to copy the file (destination). C:\>copy practice.txt dosclass\samples A somewhat unfriendly yet useful diagram of the command format would look something like this (where things in brackets are optional). copy [volume+pathname+]filename [volume+pathname+]directory 126 What this means is that you don't have to include the volume and pathname of the source file and the destination directory (we didn't in the first example). This is because DOS assumes that any filename or directory included in the copy command is in the current directory. Because we had moved to the root directory, both PRACTICE.TXT and DOSCLASS were in the current directory. But the nice thing about command-line interfaces is that you don't have to be in the directory of the files you wish to act on. The command-line interface makes it possible for the user to use one command to copy any file anywhere to any other location. From any directory (within volume C:), we could have used the following command to copy the same file to the same directory: C:\DOSCLASS>copy \practice.txt \dosclass\samples This command would perform the same function as the first command. We just told the computer where to find the source file since it wasn't in the current directory by placing a backslash in front of the filenames (which told the computer that the file was in the root directory). This applies to copies between volumes as well. All you have to do is specify the volume: Z:\ANYWHERE>copy c:\practice.txt c:\dosclass\samples or, a very common and useful example: C:\>copy practice.txt a:\backup\dosclass This command copied the file to a floppy disk in the PC's floppy drive (which already had the directories, BACKUP\DOSCLASS). 16.5.5 Delete and Undelete There is a slight problem now. The PRACTICE.TXT file was copied into the SAMPLES directory, but there is no reason to have two copies of PRACTICE.TXT. To delete a file, use the DEL command (from "delete") followed by the name of the file. C:\>del practice.txt Again, you can delete the file from any directory by including the full pathname. Z:\>del c:\practice.txt If you accidentally delete something, there is a possibility of retrieving it using the "undelete" command. This, however, will only work for files just deleted. C:\undelete A list of any files that can be undeleted will appear. You will need to replace the first character of the file because DOS removes the first letter of a deleted file (that's how it keeps track of what can be written over). 16.5.6 Rename 127 Like the above commands, you needn't be in the same directory as the file you wish to rename provided you include the pathname of the file you wish to change. But the second argument of this command, the new filename, will not accept a pathname designation. In other words, the second argument should just be a filename (with no pathname): C:\>ren \dosclass\samples\practice.txt prac.txt Note: Being able to designate a new path would in effect allow the user to move a file from one place to another without having to copy and delete. A common complaint about DOS is that there is no "move" command. Therefore, the only way to move files from one location to another is first to copy them and then to delete the originals. 16.5.7 Parent directory If you wish to move up one level in the hierarchy of the file structure (change to the parent directory), there is a shortcut--two consecutive periods: ".." C:\DOSCLASS\SAMPLES>cd .. This works in a pathname as well: C:\DOSCLASS\SAMPLES>ren ..\practice.txt prac.txt If the file PRACTICE.TXT were in the directory above SAMPLES (the DOSCLASS directory), the above command would change it to PRAC.TXT. 16.5.8 F3 The F3 function key can be a time saver if you're prone to typos. Pressing the F3 button will retype the last command for you. 16.5.9 Breaking Sometimes, you may start a procedure such as a long directory listing and wish to stop before it is completed. The Break command often works when running batch files and other executable files. To stop a procedure, press [CTRL] + [BREAK]. [CTRL] is located in the lower right-hand corner of the keyboard. [BREAK] is located in the upper right hand corner of the keyboard and is also labeled [PAUSE]. 16.5.10 Wildcards (*) and (?) Another benefit of the command-prompt interface is the ability to use wildcards. If you want, for example, to copy only files with the .txt extension, you employ a wildcard: C:\>copy *.txt a:\backup\txtfiles This command would copy all of the files with the .txt extension onto a floppy disk inside the TXTFILES directory which is inside the BACKUP directory. To copy all files from the C drive to the A drive, you would type: 128 C:\>copy *.* a: The wildcard is often used when retrieving a directory of similarly named files such as: C:\>dir *.txt This command would display all files ending with the .txt extension. To list all files that begin with the letter g, you would type the following: C:\>dir g*.* Additionally the ? can be used to substitute for individual letters. If there are many similarly named files that end with the letters GOP but begin with letters A through G, then you can type the following to list those files: C:\>dir ?gop.* The following command would list all similar files beginning with the letters REP and ending with A. C:\>dir rep?a.* The ? wildcard can be used to replace any letter in any part of the filename. Wildcards such as * and ? can be useful when you do not know the full name of a file or files and wish to list them separately from the main directory listing. 16.6 Executing, Viewing, Editing, Printing Executing Binary files ending in .exe are usually "executed" by typing the filename as if it were a command. The following command would execute the WordPerfect application which appears on the disk directory as WP.EXE: C:\APPS\WP51>wp Binary files ending in .com often contain one or more commands for execution either through the command prompt or through some program. Viewing Text files, on the other hand can be viewed quickly with the type command: C:\>cd dosclass\samples C:\DOSCLASS\SAMPLES>type practice.txt | more Editing 129 Or you can view the file in the provided text editor (just as we did when we first created practice.txt): C:\DOSCLASS\SAMPLES>edit practice.txt Printing If you want to send a text file to the printer, there is the print command. But there are two steps: C:\DOSCLASSES\SAMPLES>print practice.txt Name of list device [PRN]: lpt2 If you wish to print to a networked printer, usually lpt2 is the correct response. For local printers, the correct response is usually lpt1. 16.7 Backup Files It is possible to lose files by mistake, although the more you practice the less likely it becomes. For your own peace of mind, it is good practice to make backup copies of your most valuable files on a separate diskette. Store your backup disk in a safe place and don't carry it through a metal detector. Use the COPY command to create the backup. There is no need to backup every file you create, only the ones in which you've invested much work. Also, prune your backup diskette every week or two using the ERASE command. Backup files which have been made redundant by subsequent additions will simply create clutter on your backup diskette. An effective file naming convention is essential to keeping track of your backups. 16.8 Other commands 16.8.1 Change the Default Drive To change the default drive, simply type the letter of the your choice. The new default will be listed in subsequent DOS prompts. Example: C> A: [enter] Changes the default drive from C to A. A> C: [enter] Changes the default drive from A to C. [enter] means that you must press the Enter Key before the format command will execute. [Enter] is required after any DOS command, it is assumed in all commands found below. 16.8.2 Change Directory Command Once you have located the directory you want, you may move from directory to directory using the CD command (change directory) 130 Example: C> cd furniture Moves you to the directory called 'FURNITURE' C> cd \furniture\chairs Moves you to the directory called 'CHAIRS' under the directory called 'FURNITURE'. C> cd .. Moves you up one level in the path. C> cd \ Takes you back to the root directory (c: in this case). 16.8.3 DIR (Directory) Command The DIRECTORY command lists the names and sizes of all files located on a particular disk. Example: C> dir a: Shows directory of drive A C> dir b: Shows directory of drive B C> dir \agis Shows files in a subdirectory on drive C (default) C> dir Shows directory of drive C C> dir /w Shows directory in wide format, as opposed to a vertical listing. All the files are listed at the screen, you can stop the display by typing CTRL-BREAK. If you ask for a directory on the A or B drives, be sure there is a diskette in the drive and that the diskette has been formatted. If the drive is empty, or if the diskette is unformatted, the DOS will respond with an error message. 16.8.4 ERASE Command The ERASE command deletes specified files. Example: C> erase a:myfile.txt Erases the file MYFILE.TXT from the diskette in the A drive. If no drive specification is entered, the system looks to delete the specified file form drive C (in this case). IMPORTANT WARNING: This command is easy to use, but it is the most dangerous one you will encounter in DOS (apart form FORMAT). If you aren't careful, you may delete a file which you--or someone else--needs. And, unless you have saved a backup of that file, the erased file is gone for good. For this reason it is good practice to use only complete file specifications with the ERASE command (and to keep backups of your most valuable files). As a safety precaution, never use the wild-card characters '*' and '?' in ERASE commands. 131 16.8.5 FORMAT Command You must format new disks before using them on the IBM computers. The format command checks a diskette for flaws and creates a directory where all the names of the diskette's files will be stored. Example: C> format a: Formats the diskette in the A drive. After entering this command, follow the instructions on the screen. When the FORMAT operation is complete, the system will ask if you wish to FORMAT more diskettes. If you are working with only one diskette, answer N (No) and carry on with you work. If you wish to FORMAT several diskettes, answer Y (Yes) until you have finished formatting all your diskettes. BEWARE: Executing the format command with a diskette which already contains files will result in the deletion of all the contents of the entire disk. It is best to execute the format command only on new diskettes. If you format an old diskette make sure it contains nothing you wish to save. 16.8.6 Rebooting the computer (Ctrl-Alt-Del) In some cases, when all attempts to recover from a barrage of error messages fails, as a last resort you can reboot the computer. To do this, you press, all at once, the control, alternate and delete. BEWARE: If you re-boot, you may loose some of your work--any data active in RAM which has not yet been saved to disk. 16.8.7 RENAME (REN) Command The RENAME command permits users to change the name of a file without making a copy of it. Example: C> ren a:goofy.txt pluto.txt Changes the name of 'GOOFY.TXT' on the A drive to 'PLUTO.TXT'. This command is very simple to use, just remember two points: the file name and extension must be complete for the source file and no drive specification is given for the target. Renaming can only occur on a single disk drive (otherwise COPY must be used). 16.8.8 RMDIR (RD) Remove Directory Command This command removes a directory. It is only possible to execute this command if the directory you wish to remove is empty. Example: 132 C> rd mine Removes directory called 'MINE'. 16.8.9 Stop Execution (Ctrl-Break) If you wish to stop the computer in the midst of executing the current command, you may use the key sequence Ctrl-Break. Ctrl-Break does not always work with non-DOS commands. Some software packages block its action in certain situations, but it is worth trying before you re-boot. 16.9 The system view of MS-DOS The systems view of MS-DOS deals with the internal organization of the operating system. On booting the boot sector gets transferred to memory which then loads the initialization routine (the DOS first portion) to organize the memory and then loads the next portion of MS-DOS. Then the resident code and the transient code are loaded into the low and high memory respectively. Interrupt handlers and system calls belong to resident code, while command processor and the internal commands belong to transient code. 16.10 The future of MS-DOS Early versions of Microsoft Windows were shell programs that ran in DOS. Windows 3.11 extended the shell by going into protected mode and added 32-bit support. These were 16-bit/32-bit hybrids. Microsoft Windows 95 further reduced DOS to the role of the boot loader. Windows 98 and Windows Me were the last Microsoft OS to run on DOS. The DOSbased branch was eventually abandoned in favor of Windows NT, the first true 32-bit system that was the foundation for Windows XP and Windows Vista. Windows NT, initially NT OS/2 3.0, was the result of collaboration between Microsoft and IBM to develop a 32-bit operating system that had high hardware and software portability. Because of the success of Windows 3.0, Microsoft changed the application programming interface to the extended Windows API, which caused a split between the two companies and a branch in the operating system. IBM would continue to work on OS/2 and OS/2 API, while Microsoft renamed its operating system Windows NT. 16.11 Let us sum up In this lesson we have learnt about a) the History of MS-DOS b) and the various versions of MS-DOS 16.12 Points for Discussion Try to answer the following questions a) 5 MS-DOS commands b) Users view of MS-DOS 16.13 Model Answers to Check your Progress 133 In order to check your progress try to answer the following questions a) Systems view of MS-DOS b) How to delete a file c) How to delete a directory 16.14 Lesson - end activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. b) Discuss about the future of MS-DOS c) Discuss about the various versions of MS-DOS 16.15 References H.M. Deitel, Chapter 19 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 8 of “Modern Operating Systems”, PHI, 1996 134 LESSON – 17: UNIX CONTENTS 17.1 Aims and Objectives 17.2 Introduction 17.3 History of UNIX 17.4 Hierarchical File System 17.5 The UNIX File System Organization 17.5.1 The bin Directory 17.5.2 The dev Directory 17.5.3 The etc Directory 17.5.4 The lib Directory 17.5.5 The lost+found Directory 17.5.6 The mnt and sys Directories 17.5.7 The tmp Directory 17.5.8 The usr Directory 17.6 Other Miscellaneous Stuff at the Top Level 17.7 Let us sum up 17.8 Points for Discussion 17.9 Model Answers to Check your Progress 17.10 Lesson - end activities 17.11 References 17.1 Aims and Objectives In this lesson we will learn about the introduction to UNIX Operating system. The objectives of this lesson is to make the candidate aware of the following i) History of UNIX j) Various versions of UNIX k) UNIX file system organization 17.2 Introduction The Unix operating system was created more than 30 years ago by a group of researchers at AT&T’s Bell Laboratories. During the three decades of constant development that have followed, Unix has found a home in many places, from the ubiquitous mainframe to home computers to the smallest of embedded devices. This lesson provides a brief overview 135 of the history of Unix, discusses some of the differences among the many Unix systems in use today, and covers the fundamental concepts of the basic Unix operating system. UNIX is a computer operating system, a control program that works with users to run programs, manage resources, and communicate with other computer systems. Several people can use a UNIX computer at the same time; hence UNIX is called a multiuser system. Any of these users can also run multiple programs at the same time; hence UNIX is called multitasking. Because UNIX is such a pastiche—a patchwork of development—it’s a lot more than just an operating system. UNIX has more than 250 individual commands. These range from simple commands—for copying a file, for example—to the quite complex: those used in high-speed networking, file revision management, and software development. Most notably, UNIX is a multichoice system. As an example, UNIX has three different primary command-line-based user interfaces (in UNIX, the command-line user interface is called a shell ): The three choices are the Bourne shell, C shell, and Korn shell. Often, soon after you learn to accomplish a task with a particular command, you discover there’s a second or third way to do that task. This is simultaneously the greatest strength of UNIX and a source of frustration for both new and current users. Why is having all this choice such a big deal? Think about why Microsoft MS-DOS and the Apple Macintosh interfaces are considered so easy to use. Both are designed to give the user less power. Both have dramatically fewer commands and precious little overlap in commands: You can’t use copy to list your files in DOS, and you can’t drag a Mac file icon around to duplicate it in its own directory. The advantage to these interfaces is that, in either system, you can learn the one-and-only way to do a task and be confident that you’re as sophisticated in doing that task as is the next person. It’s easy. It’s quick to learn. It’s exactly how the experts do it, too. UNIX, by contrast, is much more like a spoken language, with commands acting as verbs, command options (which you learn about later in this lesson) acting as adjectives, and the more complex commands acting akin to sentences. How you do a specific task can, therefore, be completely different from how your UNIX-expert friend does the same task. Worse, some specific commands in UNIX have many different versions, partly because of the variations from different UNIX vendors. (You’ve heard of these variations and vendors, I’ll bet: UNIXWare from Novell, Solaris from Sun, SCO from Santa Cruz, System V Release 4 (pronounce that “system five release four” or, to sound like an ace, “ess-vee-are-four”), and BSD UNIX (pronounced “bee-ess-dee”) from University of California at Berkeley are the primary players. Each is a little different from the other.) Another contributor to the sprawl of modern UNIX is the energy of the UNIX programming community; plenty of UNIX users decide to write a new version of a command in order to solve slightly different problems, thus spawning many versions of a command. In terms of computers, Unix has a long history. Unix was developed at AT&T’s Bell Laboratories after Bell Labs withdrew from a long-term collaboration with General Electric (G.E.) and MIT to create an operating system called MULTICS (Multiplexed Operating and Computing System) for G.E.’s mainframe. In 1969, Bell Labs researchers created the first version of Unix (then called UNICS, or Uniplexed Operating and Computing System), which has evolved into the common Unix systems of today. 136 Unix was gradually ported to different machine architectures from the original PDP-7 minicomputer and was used by universities. The source code was made available at a small fee to encourage its further adoption. As Unix gained acceptance by universities, students who used it began graduating and moving into positions where they were responsible for purchasing systems and software. When those people began purchasing systems for their companies, they considered Unix because they were familiar with it, spreading adoption further. Since the first days of Unix, the operating system has grown significantly, so that it now forms the backbone of many major corporations’ computer systems. Unix no longer is an acronym for anything, but it is derived from the UNICS acronym. Unix developers and users use a lot of acronyms to identify things in the system and for commands. Unlike DOS, Windows, OS/2, the Macintosh, VMS, MVS, and just about any other operating system, UNIX was designed by a couple of programmers as a fun project, and it evolved through the efforts of hundreds of programmers, each of whom was exploring his or her own ideas of particular aspects of OS design and user interaction. In this regard, UNIX is not like other operating systems, needless to say! It all started back in the late 1960s in a dark and stormy laboratory deep in the recesses of the American Telephone and Telegraph (AT&T) corporate facility in New Jersey. Working with the Massachusetts Institute of Technology, AT&T Bell Labs was codeveloping a massive, monolithic operating system called Multics. On the Bell Labs team were Ken Thompson, Dennis Ritchie, Brian Kernighan, and other people in the Computer Science Research Group who would prove to be key contributors to the new UNIX operating system. When 1969 rolled around, Bell Labs was becoming increasingly disillusioned with Multics, an overly slow and expensive system that ran on General Electric mainframe computers that themselves were expensive to run and rapidly becoming obsolete. The problem was that Thompson and the group really liked the capabilities Multics offered, particularly the individual-user environment and multiple-user aspects. 17.3 History of UNIX In the 1960s, the Massachusetts Institute of Technology, AT&T Bell Labs, and General Electric worked on an experimental operating system called Multics (Multiplexed Information and Computing Service), which was designed to run on the GE-645 mainframe computer. The aim was the creation of a commercial product, although this was never a great success. Multics was an interactive operating system with many novel capabilities, including enhanced security. The project did develop production releases, but initially these releases performed poorly. AT&T Bell Labs pulled out and deployed its resources elsewhere. One of the developers on the Bell Labs team, Ken Thompson, continued to develop for the GE-645 mainframe, and wrote a game for that computer called Space Travel. However, he found that the game was too slow on the GE machine and was expensive, costing $75 per execution in scarce computing time. 137 Thompson thus re-wrote the game in assembly language for Digital Equipment Corporation's PDP-7 with help from Dennis Ritchie. This experience, combined with his work on the Multics project, led Thompson to start a new operating system for the PDP-7. Thompson and Ritchie led a team of developers, including Rudd Canady, at Bell Labs developing a file system as well as the new multi-tasking operating system itself. They included a command line interpreter and some small utility programs. Editing a shell script using the ed editor. The dollar-sign at the top of the screen is the prompt printed by the shell. 'ed' is typed to start the editor, which takes over from that point on the screen downwards. 1970s In 1970 the project was named Unics, and could - eventually - support two simultaneous users. Brian Kernighan invented this name as a contrast to Multics; the spelling was later changed to UNIX. Up until this point there had been no financial support from Bell Labs. When the Computer Science Research Group wanted to use UNIX on a much larger machine than the PDP-7, Thompson and Ritchie managed to trade the promise of adding text processing capabilities to UNIX for a PDP-11/20 machine. This led to some financial support from Bell. For the first time in 1970, the UNIX operating system was officially named and ran on the PDP-11/20. It added a text formatting program called roff and a text editor. All three were written in PDP-11/20 assembly language. Bell Labs used this initial "text processing system", made up of UNIX, roff, and the editor, for text processing of patent applications. Roff soon evolved into troff, the first electronic publishing program with a full typesetting capability. 1973s In 1973, the decision was made to re-write UNIX in the C programming language. The change meant that it was easier to modify UNIX to work on other machines (thus becoming portable), and other developers could create variations. The code was now more concise and compact, leading to accelerated development of UNIX. AT&T made UNIX available to universities and commercial firms, as well as the United States government under licenses. The licenses included all source code including the machine-dependent parts of the kernel, which were written in PDP-11 assembly code. 1975s Versions of the UNIX system were determined by editions of its user manuals, so that (for example) "Fifth Edition UNIX" and "UNIX Version 5" have both been used to designate the same thing. Development expanded, with Versions 4, 5, and 6 being released by 1975. These versions added the concept of pipes, leading to the development of a more modular code-base, increasing development speed still further. Version 5 and especially Version 6 led to a plethora of different Unix versions both inside and outside Bell Labs, including PWB/UNIX, IS/1 (the first commercial Unix), and the University of Wollongong's port to the Interdata 7/32 (the first non-PDP Unix). 138 1978s In 1978, UNIX/32V, for the VAX system, was released. By this time, over 600 machines were running UNIX in some form. 1979s Version 7 UNIX, the last version of Research Unix to be released widely, was released in 1979. Versions 8, 9 and 10 were developed through the 1980s but were only released to a few universities, though they did generate papers describing the new work. This research led to the development of Plan 9 from Bell Labs, a new portable distributed system. 1980s A late-80s style UNIX desktop running the X Window System graphical user interface. Shown are a number of client applications common to the MIT X Consortium's distribution, including Tom's Window Manager, an X Terminal, Xbiff, xload, and a graphical manual page browser. AT&T now licensed UNIX System III, based largely on Version 7, for commercial use, the first version launching in 1982. This also included support for the VAX. AT&T continued to issue licenses for older UNIX versions. To end the confusion between all its differing internal versions, AT&T combined them into UNIX System V Release 1. This introduced a few features such as the vi editor and curses from the Berkeley Software Distribution of Unix developed at the University of California, Berkeley. This also included support for the Western Electric 3B series of machines. Since the newer commercial UNIX licensing terms were not as favorable for academic use as the older versions of Unix, the Berkeley researchers continued to develop BSD Unix as an alternative to UNIX System III and V, originally on the PDP-11 architecture (the 2.xBSD releases, ending with 2.11BSD) and later for the VAX-11 (the 4.x BSD releases). Many contributions to UNIX first appeared on BSD systems, notably the C shell with job control (modeled on ITS). Perhaps the most important aspect of the BSD development effort was the addition of TCP/IP network code to the mainstream UNIX kernel. The BSD effort produced several significant releases that contained network code: 4.1cBSD, 4.2BSD, 4.3BSD, 4.3BSD-Tahoe ("Tahoe" being the nickname of the CCI Power 6/32 architecture that was the first non-DEC release of the BSD kernel), Net/1, 4.3BSD-Reno (to match the "Tahoe" naming, and that the release was something of a gamble), Net/2, 4.4BSD, and 4.4BSD-lite. The network code found in these releases is the ancestor of much TCP/IP network code in use today, including code that was later released in AT&T System V UNIX and early versions of Microsoft Windows. The accompanying Berkeley Sockets API is a de facto standard for networking APIs and has been copied on many platforms. 1982s Other companies began to offer commercial versions of the UNIX System for their own mini-computers and workstations. Most of these new UNIX flavors were developed from the System V base under a license from AT&T; however, others were based on BSD instead. One of the leading developers of BSD, Bill Joy, went on to co-found Sun 139 Microsystems in 1982 and create SunOS (now Solaris) for their workstation computers. In 1980, Microsoft announced its first Unix for 16-bit microcomputers called Xenix, which the Santa Cruz Operation (SCO) ported to the Intel 8086 processor in 1983, and eventually branched Xenix into SCO UNIX in 1989. For a few years during this period (before PC compatible computers with MS-DOS became dominant), industry observers expected that UNIX, with its portability and rich capabilities, was likely to become the industry standard operating system for microcomputers. 1984s In 1984 several companies established the X/Open consortium with the goal of creating an open system specification based on UNIX. Despite early progress, the standardization effort collapsed into the "Unix wars," with various companies forming rival standardization groups. The most successful Unix-related standard turned out to be the IEEE's POSIX specification, designed as a compromise API readily implemented on both BSD and System V platforms, published in 1988 and soon mandated by the United States government for many of its own systems. Between 1987 - 1989 AT&T added various features into UNIX System V, such as file locking, system administration, streams, new forms of IPC, the Remote File System and TLI. AT&T cooperated with Sun Microsystems and between 1987 and 1989 merged features from Xenix, BSD, SunOS, and System V into System V Release 4 (SVR4), independently of X/Open. This new release consolidated all the previous features into one package, and heralded the end of competing versions. It also increased licensing fees. The Common Desktop Environment or CDE, a graphical desktop for UNIX codeveloped in the 1990s by HP, IBM, and Sun as part of the COSE initiative. 1990s In 1990, the Open Software Foundation released OSF/1, their standard UNIX implementation, based on Mach and BSD. The Foundation was started in 1988 and was funded by several Unix-related companies that wished to counteract the collaboration of AT&T and Sun on SVR4. Subsequently, AT&T and another group of licensees formed the group "UNIX International" in order to counteract OSF. This escalation of conflict between competing vendors gave rise again to the phrase "Unix wars". 1991s In 1991, a group of BSD developers (Donn Seeley, Mike Karels, Bill Jolitz, and Trent Hein) left the University of California to found Berkeley Software Design, Inc (BSDI). BSDI produced a fully functional commercial version of BSD Unix for the inexpensive and ubiquitous Intel platform, which started a wave of interest in the use of inexpensive hardware for production computing. Shortly after it was founded, Bill Jolitz left BSDI to pursue distribution of 386BSD, the free software ancestor of FreeBSD, OpenBSD, and NetBSD. 140 1993s In 1993 most commercial vendors had changed their variants of UNIX to be based on System V with many BSD features added on top. The creation of the COSE initiative that year by the major players in UNIX marked the end of the most notorious phase of the UNIX wars, and was followed by the merger of UI and OSF in 1994. The new combined entity, which retained the OSF name, stopped work on OSF/1 that year. By that time the only vendor using it was Digital, which continued its own development, rebranding their product Digital UNIX in early 1995. Shortly after UNIX System V Release 4 was produced, AT&T sold all its rights to UNIX® to Novell. (Dennis Ritchie likened this to the Biblical story of Esau selling his birthright for the proverbial "mess of pottage".) Novell developed its own version, UnixWare, merging its NetWare with UNIX System V Release 4. Novell tried to use this to battle against Windows NT, but their core markets suffered considerably. In 1993, Novell decided to transfer the UNIX® trademark and certification rights to the X/Open Consortium. In 1996, X/Open merged with OSF, creating the Open Group. Various standards by the Open Group now define what is and what is not a "UNIX" operating system, notably the post-1998 Single UNIX Specification. 1995s In 1995, the business of administering and supporting the existing UNIX licenses, plus rights to further develop the System V code base, were sold by Novell to the Santa Cruz Operation. Whether Novell also sold the copyrights is currently the subject of litigation (see below). 1997s In 1997, Apple Computer sought out a new foundation for its Macintosh operating system and chose NEXTSTEP, an operating system developed by NeXT. The core operating system was renamed Darwin after Apple acquired it. It was based on the BSD family and the Mach kernel. The deployment of Darwin BSD Unix in Mac OS X makes it, according to a statement made by an Apple employee at a USENIX conference; the most widely used Unixbased system in the desktop computer market. 2000 to present In 2000, SCO sold its entire UNIX business and assets to Caldera Systems, which later on changed its name to The SCO Group. This new player then started legal action against various users and vendors of Linux. SCO have alleged that Linux contains copyrighted UNIX code now owned by The SCO Group. Other allegations include tradesecret violations by IBM, or contract violations by former Santa Cruz customers who have since converted to Linux. However, Novell disputed the SCO group's claim to hold copyright on the UNIX source base. According to Novell, SCO (and hence the SCO Group) are effectively franchise operators for Novell, which also retained the core copyrights, veto rights over future licensing activities of SCO, and 95% of the licensing revenue. The SCO Group disagreed with this, and the dispute had resulted in the SCO v. Novell lawsuit. 141 In 2005, Sun Microsystems released the bulk of its Solaris system code (based on UNIX System V Release 4) into an open source project called OpenSolaris. New Sun OS technologies such as the ZFS file system are now first released as open source code via the OpenSolaris project; as of 2006 it has spawned several non-Sun distributions such as SchilliX,Belenix, Nexenta and MarTux. The Dot-com crash has led to significant consolidation of UNIX users as well. Of the many commercial flavors of UNIX that were born in the 1980s, only Solaris, HP-UX, and AIX are still doing relatively well in the market, though SGI's IRIX persisted for quite some time. Of these, Solaris has the most market share, and may be gaining popularity due to its feature set and also since it now has an Open Source version. 17.4 Hierarchical File System In a nutshell, a hierarchy is a system organized by graded categorization. A familiar example is the organizational structure of a company, where workers report to supervisors and supervisors report to middle managers. Middle managers, in turn, report to senior managers, and senior managers report to vice-presidents, who report to the president of the company. Graphically, this hierarchy looks like Figure 1.1. Figure 1.1. A typical organizational hierarchy You’ve doubtless seen this type of illustration before, and you know that a higher position indicates more control. Each position is controlled by the next highest position or row. The president is top dog of the organization, but each subsequent manager is also in control of his or her own small fiefdom. To understand how a file system can have a similar organization, simply imagine each of the managers in the illustration as a “file folder” and each of the employees as a piece of paper, filed in a particular folder. Open any file cabinet, and you probably see things organized this 142 Moving About in the File System way: filed papers are placed in labeled folders, and often these folders are filed in groups under specific topics. The drawer might then have a specific label to distinguish it from other drawers in the cabinet, and so on. That’s exactly what a hierarchical file system is all about. You want to have your files located in the most appropriate place in the file system, whether at the very top, in a folder, or in a nested series of folders. With careful usage, a hierarchical file system can contain hundreds or thousands of files and still allow users to find any individual file quickly. On my computer, the chapters of this book are organized in a hierarchical fashion, as shown in Figure 1.2. Figure 1.2. File organization for the chapters of Teach 17.5 The UNIX File System Organization A key concept enabling the UNIX hierarchical file system to be so effective is that any thing that is not a folder is a file. Programs are files in UNIX, device drivers are files, documents and spreadsheets are files, your keyboard is represented as a file, your display is a file, and even your try line and mouse are files. What this means is that as UNIX has developed, it has avoided becoming an ungainly mess. UNIX does not have hundreds of cryptic files stuck at the top (this is still a problem in DOS)or tucked away in confusing folders within the System Folder (as with the Macintosh).The top level of the UNIX file structure (/) is known as the root directory or slash directory, and it always has a certain set of subdirectories, including bin, dev, etc, lib, mnt, tmp, and usr. There can be a lot more, however. Listing 1.1 shows files found at the top level of the mentor file system (the system I work on). Typical UNIX directories are shown followed by a slash in the listing. AA boot flags/ rf/ userb/ var/ OLD/ core gendynix stand/ userc/ archive/ dev/ lib/ sys/ users/ ats/ diag/ lost+found/ tftpboot/ usere/ backup/ dynix mnt/ tmp/ users/ bin/ etc/ net/ usera/ usr/ 143 You can obtain a listing of the files and directories in your own top-level directory by using the ls –C -F / command. (You’ll learn all about the ls command in the next hour. For now, just be sure that you enter exactly what’s shown in the example.) On a different computer system, here’s what I see when I enter that command: % ls –C -F / Mail/ export/ public/ News/ home/ reviews/ add_swap/ kadb* sbin/ apps/ layout sys@ archives/ lib@ tftpboot/ bin@ lost+found/ tmp/ boot mnt/ usr/ cdrom/ net/ utilities/ chess/ news/ var/ dev/ nntpserver vmunix* etc/ pcfs/ In this example, any filename that ends with a slash (/) is a folder (UNIX calls these directories). Any filename that ends with an asterisk (*) is a program. Anything ending with an at sign (@) is a symbolic link, and everything else is a normal, plain file. As you can see from these two examples, and as you’ll immediately find when you try the command yourself, there is much variation in how different UNIX systems organize the top level directory. There are some directories and files in common, and once you start examining the contents of specific directories, you’ll find that hundreds of programs and files always show up in the same place from UNIX to UNIX. It’s as if you were working as a file clerk at a new law firm. Although this firm might have a specific approach to filing information, the approach may be similar to the filing system of other firms where you have worked in the past. If you know the underlying organization, you can quickly pick up the specifics of a particular organization. Try the command ls –C -F / on your computer system, and identify, as previously explained, each of the directories in your resultant listing. The output of the ls command shows the files and directories in the top level of your system. Next, you learn what they are. 17.5.1 The bin Directory In UNIX parlance, programs are considered executables because users can execute them. (In this case, execute is a synonym for run, not an indication that you get to wander about murdering innocent applications!) When the program has been compiled (usually from a C listing), it is translated into what’s called a binary format. Add the two together, and you have a common UNIX description for an application—an executable binary. 3 It’s no surprise that the original UNIX developers decided to have a directory labeled “binaries” to store all the executable programs on the system. Remember the primitive teletypewriter discussed in the last hour? Having a slow system to talk with the computer had many ramifications that you might not expect. The single most obvious one was that everything became quite concise. There were no lengthy words like binaries or listfiles, but rather succinct abbreviations: bin and ls are, respectively, the UNIX equivalents. The bin directory is where all the executable binaries were kept in early UNIX. Over time, as more 144 and more executables were added to UNIX, having all the executables in one place proved unmanageable, and the bin directory split into multiple parts (/bin, /sbin, /usr/bin). 17.5.2 The dev Directory Among the most important portions of any computer are its device drivers. Without them, you wouldn’t have any information on your screen (the information arrives courtesy of the display device driver). You wouldn’t be able to enter information (the information is read and given to the system by the keyboard device driver), and you wouldn’t be able to use your floppy disk drive (managed by the floppy device driver). Earlier, you learned how almost anything in UNIX is considered a file in the file system, and the dev directory is an example. All device drivers—often numbering into the hundreds— are stored as separate files in the standard UNIX dev (devices) directory. Pronounce this directory name “dev,” not “dee-ee-vee.” 17.5.3 The etc Directory UNIX administration can be quite complex, involving management of user accounts, the file system, security, device drivers, hardware configurations, and more. To help, UNIX designates the etc directory as the storage place for all administrative files and information. Pronounce the directory name either “ee-tea-sea”, “et-sea,” or “etcetera.” All three pronunciations are common. 17.5.4 The lib Directory Like your neighborhood community, UNIX has a central storage place for function and procedural libraries. These specific executables are included with specific programs, allowing programs to offer features and capabilities otherwise unavailable. The idea is that if programs want to include certain features, they can reference just the shared copy of that utility in the UNIX library rather than having a new, unique copy. Many of the more recent UNIX systems also support what’s called dynamic linking, where the library of functions is included on-the-fly as you start up the program. The wrinkle is that instead of the library reference being resolved when the program is created, it’s resolved only when you actually run the program itself. Pronounce the directory name “libe” or “lib” (to rhyme with the word bib). 17.5.5 The lost+found Directory With multiple users running many different programs simultaneously, it’s been a challenge over the years to develop a file system that can remain synchronized with the activity of the computer. Various parts of the UNIX kernel—the brains of the system—help with this problem. When files are recovered after any sort of problem or failure, they are placed here, in the lost+found directory, if the kernel cannot ascertain the proper location in the filesystem. This directory should be empty almost all the time. This directory is commonly pronounced “lost and found” rather than “lost plus found.” 17.5.6 The mnt and sys Directories 145 The mnt (pronounced “em-en-tea”) and sys (pronounced “sis”) directories also are safely ignored by UNIX users. The mnt directory is intended to be a common place to mount external media—hard disks, removable cartridge drives, and so on—in UNIX. On many systems, though not all, sys contains files indicating the system configuration. 17.5.7 The tmp Directory A directory that you can’t ignore, the tmp directory—say “temp”—is used by many of the programs in UNIX as a temporary file-storage space. If you’re editing a file, for example, the program makes a copy of the file and saves it in tmp, and you work directly with that, saving the new file back to your original file only when you’ve completed your work. On most systems, tmp ends up littered with various files and executables left by programs that don’t remove their own temporary files. On one system I use, it’s not uncommon to find 10– 30 megabytes of files wasting space here. Even so, if you’re manipulating files or working with copies of files, tmp is the best place to keep the temporary copies of files. Indeed, on some UNIX workstations, tmp actually can be the fastest device on the computer, allowing for dramatic performance improvements over working with files directly in your home directory. 17.5.8 The usr Directory Finally, the last of the standard directories at the top level of the UNIX file system hierarchy is the usr—pronounced “user”—directory. Originally, this directory was intended to be the central storage place for all user-related commands. Today, however, many companies have their own interpretation, and there’s no telling what you’ll find in this directory. 3 17.6 Other Miscellaneous Stuff at the Top Level Besides all the directories previously listed, a number of other directories and files commonly occur in UNIX systems. Some files might have slight variations in name on your computer, so when you compare your listing to the following files and directories, be alert for possible alternative spellings. A file you must have to bring up UNIX at all is one usually called unix or vmunix, or named after the specific version of UNIX on the computer. The file contains the actual UNIX operating system. The file must have a specific name and must be found at the top level of the file system. Hand-in-hand with the operating system is another file called boot, which helps during initial startup of the hardware. Notice on one of the previous listings that the files boot and dynix appear. (DYNIX is the name of the particular variant of UNIX used on Sequent computers.) By comparison, the listing from the Sun Microsystems workstation shows boot and vmunix as the two files. Another directory that you might find in your own top-level listing is diag—pronounced “dye-ag”—which acts as a storehouse for diagnostic and maintenance programs. If you have any programs within this directory, it’s best not to try them out without proper training! The home directory, also sometimes called users, is a central place for organizing all files unique to a specific user. Listing this directory is usually an easy way to find out what accounts are on the system, too, because by convention each individual account directory is named after 146 the user’s account name. On one system I use, my account is taylor, and my individual account directory is also called taylor. Home directories are always created by the system administrator. The net directory, if set up correctly, is a handy shortcut for accessing other computers on your network. The tftpboot directory is a relatively new feature of UNIX. The letters stand for “trivial file Transfer protocol boot.” Don’t let the name confuse you, though; this directory contains Versions of the kernel suitable for X Window System-based terminals and diskless workstations to run UNIX. Some UNIX systems have directories named for specific types of peripherals that can be attached. On the Sun workstation, you can see examples with the directories cd rom and pcfs.The former is for a CD-ROM drive and the latter for DOS-format floppy disks. There are many more directories in UNIX, but this will give you an idea of how things are organized. 17.7 Let us sum up In this lesson we have learnt about a) The History of UNIX b) And UNIX file system organization 17.8 Points for Discussion Try to discuss about the following a) Evolution of UNIX b) Future of UNIX 17.9 Model Answers to Check your Progress In order to check your progress, try to answer the uses of the following directories a) bin directory b) dev directory c) lib directory 17.10 Lesson - end activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. b) Discuss about UNIX file system organization c) Discuss about various version of UNIX 17.11 References H.M. Deitel, Chapter 18 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 7 of “Modern Operating Systems”, PHI, 1996 147 LESSON – 18: KERNEL AND SHELL CONTENTS 18.1 Aims and Objectives 18.2 Introduction 18.2.1 The kernel 18.2.2 The shell 18.3 Simple commands 18.4 Background commands 18.5 Input output redirection 18.6 Pipelines and filters 18.7 File name generation 18.8 Quoting 18.9 Prompting 18.10 Shell procedures 18.11 Control Flow 18.12 Shell variables 18.13 The test command 18.14 Other Control flows 18.15 Command grouping 18.16 Debugging shell procedures 18.17 Other important commands 18.18 Let us sum up 18.19 Points for Discussion 18.20 Model Answers to Check your Progress 18.21 Lesson - end activities 18.22 References 18.1 Aims and Objectives In this lesson we will learn about the introduction to Kernel and Shell. The objectives of this lesson is to make the candidate aware of the following l) Kernel m) Shell n) And some important shell commands 148 18.2 Introduction In this section we discuss about kernel and shell. 18.2.1 The kernel The kernel of UNIX is the hub of the operating system: it allocates time and memory to programs and handles the filestore and communications in response to system calls. The kernel is responsible for carrying out all fundamental low-level system operations, like scheduling processes, opening and closing files, and sending instructions to the actual hardware CPU chips that process your data. There is only one kernel and it is the heart of the machine. As an illustration of the way that the shell and the kernel work together, suppose a user types rm myfile (which has the effect of removing the file myfile). The shell searches the filestore for the file containing the program rm, and then requests the kernel, through system calls, to execute the program rm on myfile. When the process rm myfile has finished running, the shell then returns the UNIX prompt % to the user, indicating that it is waiting for further commands. 18.2.2 The shell The shell acts as an interface between the user and the kernel. When a user logs in, the login program checks the username and password, and then starts another program called the shell. The shell is a command line interpreter (CLI). It interprets the commands the user types in and arranges for them to be carried out. The commands are themselves programs: when they terminate, the shell gives the user another prompt (% on our systems). The adept user can customise his/her own shell, and users can use different shells on the same machine. Staff and students in the school have the tcsh shell by default. The tcsh shell has certain features to help the user inputting commands. Filename Completion - By typing part of the name of a command, filename or directory and pressing the [Tab] key, the tcsh shell will complete the rest of the name automatically. If the shell finds more than one name beginning with those letters you have typed, it will beep, prompting you to type a few more letters before pressing the tab key again. History - The shell keeps a list of the commands you have typed in. If you need to repeat a command, use the cursor keys to scroll up and down the list or type history for a list of previous commands. The shell is both a command language and a programming language that provides an interface to the UNIX operating system. 18.3 Simple commands 149 Simple commands consist of one or more words separated by blanks. The first word is the name of the command to be executed; any remaining words are passed as arguments to the command. For example, who is a command that prints the names of users logged in. The command ls –l prints a list of files in the current directory. The argument -l tells ls to print status information, size and the creation date for each file. 18.4 Background commands To execute a command the shell normally creates a new process and waits for it to finish. A command may be run without waiting for it to finish. For example, cc pgm.c & calls the C compiler to compile the file pgm.c. The trailing & is an operator that instructs the shell not to wait for the command to finish. To help keep track of such a process the shell reports its process number following its creation. A list of currently active processes may be obtained using the ps command. 18.5 Input output redirection Most commands produce output on the standard output that is initially connected to the terminal. This output may be sent to a file by writing, for example, ls -l >file The notation >file is interpreted by the shell and is not passed as an argument to ls. If file does not exist then the shell creates it; otherwise the original contents of file are replaced with the output from ls. Output may be appended to a file using the notation ls -l >>file In this case file is also created if it does not already exist. The standard input of a command may be taken from a file instead of the terminal by writing, for example, wc <file The command wc reads its standard input (in this case redirected from file) and prints the number of characters, words and lines found. If only the number of lines is required then wc -l <file could be used. 150 18.6 Pipelines and filters The standard output of one command may be connected to the standard input of another by writing the `pipe' operator, indicated by |, as in, ls -l | wc Two commands connected in this way constitute a pipeline and the overall effect is the same as ls -l >file; wc <file except that no file is used. Instead the two processes are connected by a pipe and are run in parallel. Pipes are unidirectional and synchronization is achieved by halting wc when there is nothing to read and halting ls when the pipe is full. A filter is a command that reads its standard input, transforms it in some way, and prints the result as output. One such filter, grep, selects from its input those lines that contain some specified string. For example, ls | grep old prints those lines, if any, of the output from ls that contain the string old. Another useful filter is sort. For example, who | sort will print an alphabetically sorted list of logged in users. A pipeline may consist of more than two commands, for example, ls | grep old | wc -l prints the number of file names in the current directory containing the string old. 18.7 File name generation Many commands accept arguments which are file names. For example, ls -l main.c prints information relating to the file main.c. The shell provides a mechanism for generating a list of file names that match a pattern. For example, ls -l *.c 151 generates, as arguments to ls, all file names in the current directory that end in .c. The character * is a pattern that will match any string including the null string. In general patterns are specified as follows. * Matches any string of characters including the null string. ? Matches any single character. [...] Matches any one of the characters enclosed. A pair of characters separated by a minus will match any character lexically between the pair. For example, [a-z]* matches all names in the current directory beginning with one of the letters a through z. /usr/fred/test/? matches all names in the directory /usr/fred/test that consist of a single character. If no file name is found that matches the pattern then the pattern is passed, unchanged, as an argument. This mechanism is useful both to save typing and to select names according to some pattern. It may also be used to find files. For example, echo /usr/fred/*/core finds and prints the names of all core files in sub-directories of /usr/fred. (echo is a standard UNIX command that prints its arguments, separated by blanks.) This last feature can be expensive, requiring a scan of all sub-directories of /usr/fred. There is one exception to the general rules given for patterns. The character `.' at the start of a file name must be explicitly matched. echo * will therefore echo all file names in the current directory not beginning with `.'. echo .* will echo all those file names that begin with `.'. This avoids inadvertent matching of the names `.' and `..' which mean `the current directory' and `the parent directory' respectively. (Notice that ls suppresses information for the files `.' and `..'.) 18.8 Quoting Characters that have a special meaning to the shell, such as < > * ? | &, are called metacharacters. Any character preceded by a \ is quoted and loses its special meaning, if any. The \ is elided so that 152 echo \? will echo a single ?, and echo \\ will echo a single \. To allow long strings to be continued over more than one line the sequence \newline is ignored. \ is convenient for quoting single characters. When more than one character needs quoting the above mechanism is clumsy and error prone. A string of characters may be quoted by enclosing the string between single quotes. For example, echo xx'****'xx will echo xx****xx The quoted string may not contain a single quote but may contain newlines, which are preserved. This quoting mechanism is the most simple and is recommended for casual use. A third quoting mechanism using double quotes is also available that prevents interpretation of some but not all metacharacters. 153 18.9 Prompting When the shell is used from a terminal it will issue a prompt before reading a command. By default this prompt is `$ '. It may be changed by saying, for example, PS1=yesdear that sets the prompt to be the string yesdear. If a newline is typed and further input is needed then the shell will issue the prompt `> '. Sometimes this can be caused by mistyping a quote mark. If it is unexpected then an interrupt (DEL) will return the shell to read another command. This prompt may be changed by saying, for example, PS2=more The shell and login Following login (1) the shell is called to read and execute commands typed at the terminal. If the user's login directory contains the file .profile then it is assumed to contain commands and is read by the shell before reading any commands from the terminal. 18.10 Shell procedures The shell may be used to read and execute commands contained in a file. For example, sh file [ args ... ] calls the shell to read commands from file. Such a file is called a command procedure or shell procedure. Arguments may be supplied with the call and are referred to in file using the positional parameters $1, $2, .... For example, if the file wg contains who | grep $1 then sh wg fred is equivalent to who | grep fred UNIX files have three independent attributes, read, write and execute. The UNIX command chmod (1) may be used to make a file executable. For example, chmod +x wg will ensure that the file wg has execute status. Following this, the command wg fred is equivalent to sh wg fred 154 This allows shell procedures and programs to be used interchangeably. In either case a new process is created to run the command. As well as providing names for the positional parameters, the number of positional parameters in the call is available as $#. The name of the file being executed is available as $0. A special shell parameter $* is used to substitute for all positional parameters except $0. A typical use of this is to provide some default arguments, as in, nroff -T450 -ms $* which simply prepends some arguments to those already given. 18.11 Control Flow Control flow - for A frequent use of shell procedures is to loop through the arguments ($1, $2, ...) executing commands once for each argument. An example of such a procedure is tel that searches the file /usr/lib/telnos that contains lines of the form ... fred mh0123 bert mh0789 ... The text of tel is for i do grep $i /usr/lib/telnos; done The command tel fred prints those lines in /usr/lib/telnos that contain the string fred. tel fred bert prints those lines containing fred followed by those for bert. The for loop notation is recognized by the shell and has the general form for name in w1 w2 ... do command-list done A command-list is a sequence of one or more simple commands separated or terminated by a newline or semicolon. Furthermore, reserved words like do and done are only recognized following a newline or semicolon. name is a shell variable that is set to the words w1 w2 ... in turn each time the command-list following do is executed. If in w1 w2 ... is 155 omitted then the loop is executed once for each positional parameter; that is, in $* is assumed. Another example of the use of the for loop is the create command whose text is for i do >$i; done The command create alpha beta ensures that two empty files alpha and beta exist and are empty. The notation >file may be used on its own to create or clear the contents of a file. Notice also that a semicolon (or newline) is required before done. Control flow - case A multiple way branch is provided for by the case notation. For example, case $# in 1) cat >>$1 ;; 2) cat >>$2 <$1 ;; *) echo \'usage: append [ from ] to\' ;; esac is an append command. When called with one argument as append file $# is the string 1 and the standard input is copied onto the end of file using the cat command. append file1 file2 appends the contents of file1 onto file2. If the number of arguments supplied to append is other than 1 or 2 then a message is printed indicating proper usage. The general form of the case command is case word in pattern) command-list;; ... esac The shell attempts to match word with each pattern, in the order in which the patterns appear. If a match is found the associated command-list is executed and execution of the case is complete. Since * is the pattern that matches any string it can be used for the default case. A word of caution: no check is made to ensure that only one pattern matches the case argument. The first match found defines the set of commands to be executed. In the example below the commands following the second * will never be executed. 156 case $# in *) ... ;; *) ... ;; esac Another example of the use of the case construction is to distinguish between different forms of an argument. The following example is a fragment of a cc command. for i do case $i in -[ocs]) ... ;; -*) echo \'unknown flag $i\' ;; *.c) /lib/c0 $i ... ;; *) echo \'unexpected argument $i\' ;; esac done To allow the same commands to be associated with more than one pattern the case command provides for alternative patterns separated by a |. For example, case $i in -x|-y) ... esac is equivalent to case $i in -[xy]) ... Esac The usual quoting conventions apply so that case $i in \?) ... will match the character ?. 18.12 Shell variables The shell provides string-valued variables. Variable names begin with a letter and consist of letters, digits and underscores. Variables may be given values by writing, for example, user=fred box=m000 acct=mh0000 which assigns values to the variables user, box and acct. A variable may be set to the null string by saying, for example, null= The value of a variable is substituted by preceding its name with $; for example, echo $user will echo fred. Variables may be used interactively to provide abbreviations for frequently used strings. For example, 157 b=/usr/fred/bin mv pgm $b will move the file pgm from the current directory to the directory /usr/fred/bin. A more general notation is available for parameter (or variable) substitution, as in, echo ${user} which is equivalent to echo $user and is used when the parameter name is followed by a letter or digit. For example, tmp=/tmp/ps ps a >${tmp}a will direct the output of ps to the file /tmp/psa, whereas, ps a >$tmpa would cause the value of the variable tmpa to be substituted. Except for $? the following are set initially by the shell. $? is set after executing each command. $? The exit status (return code) of the last command executed as a decimal string. Most commands return a zero exit status if they complete successfully, otherwise a nonzero exit status is returned. Testing the value of return codes is dealt with later under if and while commands. $# The number of positional parameters (in decimal). Used, for example, in the append command to check the number of parameters. $$ The process number of this shell (in decimal). Since process numbers are unique among all existing processes, this string is frequently used to generate unique temporary file names. For example, ps a >/tmp/ps$$ ... rm /tmp/ps$$ $! The process number of the last process run in the background (in decimal). $The current shell flags, such as -x and -v. Some variables have a special meaning to the shell and should be avoided for general use. $MAIL When used interactively the shell looks at the file specified by this variable before it issues a prompt. If the specified file has been modified since it was last looked at the shell prints the message you have mail before prompting for the next command. This variable is typically set in the file .profile, in the user's login directory. For example, MAIL=/usr/mail/fred $HOME The default argument for the cd command. The current directory is used to resolve file name references that do not begin with a /, and is changed using the cd command. For example, cd /usr/fred/bin makes the current directory /usr/fred/bin. cat wn 158 will print on the terminal the file wn in this directory. The command cd with no argument is equivalent to cd $HOME This variable is also typically set in the the user's login profile. $PATH A list of directories that contain commands (the search path). Each time a command is executed by the shell a list of directories is searched for an executable file. If $PATH is not set then the current directory, /bin, and /usr/bin are searched by default. Otherwise $PATH consists of directory names separated by :. For example, PATH=:/usr/fred/bin:/bin:/usr/bin specifies that the current directory (the null string before the first :), /usr/fred/bin, /bin and /usr/bin are to be searched in that order. In this way individual users can have their own `private' commands that are accessible independently of the current directory. If the command name contains a / then this directory search is not used; a single attempt is made to execute the command. $PS1 The primary shell prompt string, by default, `$ '. $PS2 The shell prompt when further input is needed, by default, `> '. $IFS The set of characters used by blank interpretation (see section 3.4). 18.13 The test command The test command, although not part of the shell, is intended for use by shell programs. For example, test -f file returns zero exit status if file exists and non-zero exit status otherwise. In general test evaluates a predicate and returns the result as its exit status. Some of the more frequently used test arguments are given here, see test (1) for a complete specification. test s true if the argument s is not the null string test -f file true if file exists test -r file true if file is readable test -w file true if file is writable test -d file true if file is a directory 18.14 Other Control flows Control flow - while The actions of the for loop and the case branch are determined by data available to the shell. A while or until loop and an if then else branch are also provided whose actions are determined by the exit status returned by commands. A while loop has the general form while command-list1 do command-list2 159 done The value tested by the while command is the exit status of the last simple command following while. Each time round the loop command-list1 is executed; if a zero exit status is returned then command-list2 is executed; otherwise, the loop terminates. For example, while test $1 do ... shift done is equivalent to for i do ... done shift is a shell command that renames the positional parameters $2, $3, ... as $1, $2, ... and loses $1. Another kind of use for the while/until loop is to wait until some external event occurs and then run some commands. In an until loop the termination condition is reversed. For example, until test -f file do sleep 300; done commands will loop until file exists. Each time round the loop it waits for 5 minutes before trying again. (Presumably another process will eventually create the file.) Control flow - if Also available is a general conditional branch of the form, if command-list then command-list else command-list fi that tests the value returned by the last simple command following if. The if command may be used in conjunction with the test command to test for the existence of a file as in if test -f file then process file else do something else fi An example of the use of if, case and for constructions is given in section 2.10. A multiple test if command of the form if ... then ... else if ... then ... 160 else if ... ... fi fi fi may be written using an extension of the if notation as, if ... then ... elif ... then ... elif ... ... fi The following example is the touch command which changes the `last modified' time for a list of files. The command may be used in conjunction with make (1) to force recompilation of a list of files. flag= for i do case $i in -c) flag=N ;; *) if test -f $i then ln $i junk$$; rm junk$$ elif test $flag then echo file \'$i\' does not exist else >$i fi esac done The -c flag is used in this command to force subsequent files to be created if they do not already exist. Otherwise, if the file does not exist, an error message is printed. The shell variable flag is set to some non-null string if the -c argument is encountered. The commands ln ...; rm ... make a link to the file and then remove it thus causing the last modified date to be updated. The sequence if command1 then command2 fi may be written command1 && command2 Conversely, command1 || command2 executes command2 only if command1 fails. In each case the value returned is that of the last simple command executed. 18.15 Command grouping Commands may be grouped in two ways, 161 { command-list ; } and ( command-list ) In the first command-list is simply executed. The second form executes command-list as a separate process. For example, (cd x; rm junk ) executes rm junk in the directory x without changing the current directory of the invoking shell. The commands cd x; rm junk have the same effect but leave the invoking shell in the directory x. 18.16 Debugging shell procedures The shell provides two tracing mechanisms to help when debugging shell procedures. The first is invoked within the procedure as set -v (v for verbose) and causes lines of the procedure to be printed as they are read. It is useful to help isolate syntax errors. It may be invoked without modifying the procedure by saying sh -v proc ... where proc is the name of the shell procedure. This flag may be used in conjunction with the -n flag which prevents execution of subsequent commands. (Note that saying set -n at a terminal ill render the terminal useless until an end- of-file is typed.) The command set -x will produce an execution trace. Following parameter substitution each command is printed as it is executed. (Try these at the terminal to see what effect they have.) Both flags may be turned off by saying set and the current setting of the shell flags is available as $-. 18.17 Other important commands 18.17.1 The man command The following is the man command which is used to print sections of the UNIX manual. It is called, for example, as $ man sh $ man -t ed $ man 2 fork In the first the manual section for sh is printed. Since no section is specified, section 1 is used. The second example will typeset (-t option) the manual section for ed. cd /usr/man : 'colon is the comment command' 162 : 'default is nroff ($N), section 1 ($s)' N=n s=1 for i do case $i in [1-9]*) s=$i ;; -t) N=t ;; -n) N=n ;; -*) echo unknown flag \'$i\' ;; *) if test -f man$s/$i.$s then ${N}roff man0/${N}aa man$s/$i.$s else : 'look through all manual sections' found=no for j in 1 2 3 4 5 6 7 8 9 do if test -f man$j/$i.$j then man $j $i found=yes fi done case $found in no) echo \'$i: manual page not found\' esac fi esac done 18.17.2 Keyword parameters Shell variables may be given values by assignment or when a shell procedure is invoked. An argument to a shell procedure of the form name=value that precedes the command name causes value to be assigned to name before execution of the procedure begins. The value of name in the invoking shell is not affected. For example, user=fred command will execute command with user set to fred. The -k flag causes arguments of the form name=value to be interpreted in this way anywhere in the argument list. Such names are sometimes called keyword parameters. If any arguments remain they are available as positional parameters $1, $2, .... The set command may also be used to set positional parameters from within a procedure. For example, set - * will set $1 to the first file name in the current directory, $2 to the next, and so on. Note that the first argument, -, ensures correct treatment when the first file name begins with a -. 163 18.17.3 Parameter transmission When a shell procedure is invoked both positional and keyword parameters may be supplied with the call. Keyword parameters are also made available implicitly to a shell procedure by ecifying in advance that such parameters are to be exported. For example, export user box marks the variables user and box for export. When a shell procedure is invoked copies are made of all exportable variables for use within the invoked procedure. Modification of such variables within the procedure does not affect the values in the invoking shell. It is generally true of a shell procedure that it may not modify the state of its caller without explicit request on the part of the caller. (Shared file descriptors are an exception to this rule.) Names whose value is intended to remain constant may be declared readonly. The form of this command is the same as that of the export command, readonly name ... Subsequent attempts to set readonly variables are illegal. 18.17.4 Parameter substitution If a shell parameter is not set then the null string is substituted for it. For example, if the variable d is not set echo $d or echo ${d} will echo nothing. A default string may be given as in echo ${d-.} which will echo the value of the variable d if it is set and `.' otherwise. The default string is evaluated using the usual quoting conventions so that echo ${d-'*'} will echo * if the variable d is not set. Similarly echo ${d-$1} will echo the value of d if it is set and the value (if any) of $1 otherwise. A variable may be assigned a default value using the notation echo ${d=.} which substitutes the same string as echo ${d-.} 164 and if d were not previously set then it will be set to the string `.'. (The notation ${...=...} is not available for positional parameters.) If there is no sensible default then the notation echo ${d?message} will echo the value of the variable d if it has one, otherwise message is printed by the shell and execution of the shell procedure is abandoned. If message is absent then a standard message is printed. A shell procedure that requires some parameters to be set might start as follows. : ${user?} ${acct?} ${bin?} ... Colon (:) is a command that is built in to the shell and does nothing once its arguments have been evaluated. If any of the variables user, acct or bin are not set then the shell will abandon execution of the procedure. 18.17.5 Command substitution The standard output from a command can be substituted in a similar way to parameters. The command pwd prints on its standard output the name of the current directory. For example, if the current directory is /usr/fred/bin then the command d=`pwd` is equivalent to d=/usr/fred/bin The entire string between grave accents (`...`) is taken as the command to be executed and is replaced with the output from the command. The command is written using the usual quoting conventions except that a ` must be escaped using a \. For example, ls `echo "$1"` is equivalent to ls $1 Command substitution occurs in all contexts where parameter substitution occurs including here documents) and the treatment of the resulting text is the same in both cases. This mechanism allows string processing commands to be used within shell procedures. An example of such a command is basename which removes a specified suffix from a string. For example, basename main.c .c will print the string main. Its use is illustrated by the following fragment from a cc command. 165 case $A in ... *.c) B=`basename $A .c` ... Esac that sets B to the part of $A with the suffix .c stripped. Here are some composite examples. · for i in `ls -t`; do ... The variable i is set to the names of files in time order, most recent first. · set `date`; echo $6 $2 $3, $4 will print, e.g., 1977 Nov 1, 23:59:59 18.17.6 Evaluation and quoting The shell is a macro processor that provides parameter substitution, command substitution and file name generation for the arguments to commands. This section discusses the order in which these evaluations occur and the effects of the various quoting mechanisms. Commands are parsed initially according to the grammar given in appendix A. Before a command is executed the following substitutions occur. parameter substitution, e.g. $user command substitution, e.g. `pwd` Only one evaluation occurs so that if, for example, the value of the variable X is the string $y then echo $X will echo $y. 18.17.7 Error handling The treatment of errors detected by the shell depends on the type of error and on whether the shell is being used interactively. An interactive shell is one whose input and output are connected to a terminal (as determined by gtty (2)). A shell invoked with the -i flag is also interactive. Execution of a command (see also 3.7) may fail for any of the following reasons. Input - output redirection may fail. For example, if a file does not exist or cannot be created. The command itself does not exist or cannot be executed. 166 The command terminates abnormally, for example, with a "bus error" or "memory fault". See Figure 2 below for a complete list of UNIX signals. The command terminates normally but returns a non-zero exit status. In all of these cases the shell will go on to execute the next command. Except for the last case an error message will be printed by the shell. All remaining errors cause the shell to exit from a command procedure. An interactive shell will return to read another command from the terminal. Such errors include the following. Syntax errors. e.g., if ... then ... done A signal such as interrupt. The shell waits for the current command, if any, to finish execution and then either exits or returns to the terminal. Failure of any of the built-in commands such as cd. The shell flag -e causes the shell to terminate if any error is detected. The following are some of the values of the unix signals 1 hangup 2 interrupt 3* quit 4* illegal instruction 5* trace trap 6* IOT instruction 7* EMT instruction 8* floating point exception 9 kill (cannot be caught or ignored) 10* bus error 11* segmentation violation 12* bad argument to system call 13 write on a pipe with no one to read it 14 alarm clock 15 software termination (from kill (1)) 167 Those signals marked with an asterisk produce a core dump if not caught. However, the shell itself ignores quit which is the only external signal that can cause a dump. The signals in this list of potential interest to shell programs are 1, 2, 3, 14 and 15. 18.17.8 Fault handling Shell procedures normally terminate when an interrupt is received from the terminal. The trap command is used if some cleaning up is required, such as removing temporary files. For example, trap 'rm /tmp/ps$$; exit' 2 sets a trap for signal 2 (terminal interrupt), and if this signal is received will execute the commands rm /tmp/ps$$; exit exit is another built-in command that terminates execution of a shell procedure. The exit is required; otherwise, after the trap has been taken, the shell will resume executing the procedure at the place where it was interrupted. UNIX signals can be handled in one of three ways. They can be ignored, in which case the signal is never sent to the process. They can be caught, in which case the process must decide what action to take when the signal is received. Lastly, they can be left to cause termination of the process without it having to take any further action. If a signal is being ignored on entry to the shell procedure, for example, by invoking it in the background (see 3.7) then trap commands (and the signal) are ignored. The use of trap is illustrated by this modified version of the touch command. The cleanup action is to remove the file junk$$. The following is the touch command. flag= trap 'rm -f junk$$; exit' 1 2 3 15 for i do case $i in -c) flag=N ;; *) if test -f $i then ln $i junk$$; rm junk$$ elif test $flag then echo file \'$i\' does not exist else >$i fi esac done The trap command appears before the creation of the temporary file; otherwise it would be possible for the process to die without removing the file. 168 Since there is no signal 0 in UNIX it is used by the shell to indicate the commands to be executed on exit from the shell procedure. A procedure may, itself, elect to ignore signals by specifying the null string as the argument to trap. The following fragment is taken from the nohup command. trap '' 1 2 3 15 which causes hangup, interrupt, quit and kill to be ignored both by the procedure and by invoked commands. Traps may be reset by saying trap 2 3 which resets the traps for signals 2 and 3 to their default values. A list of the current values of traps may be obtained by writing trap The procedure scan (given below) is an example of the use of trap where there is no exit in the trap command. scan takes each directory in the current directory, prompts with its name, and then executes commands typed at the terminal until an end of file or an interrupt is received. Interrupts are ignored while executing the requested commands but cause termination when scan is waiting for input. d=`pwd` for i in * do if test -d $d/$i then cd $d/$i while echo "$i:" trap exit 2 read x do trap : 2; eval $x; done fi done read x is a built-in command that reads one line from the standard input and places the result in the variable x. It returns a non-zero exit status if either an end-of-file is read or an interrupt is received. 18.17.9 Command execution To run a command (other than a built-in) the shell first creates a new process using the system call fork. The execution environment for the command includes input, output and the states of signals, and is established in the child process before the command is executed. The built-in command exec is used in the rare cases when no fork is required and simply replaces the shell with a new command. For example, a simple version of the nohup command looks like trap \'\' 1 2 3 15 169 exec $* The trap turns off the signals specified so that they are ignored by subsequently created commands and exec replaces the shell by the command specified. Most forms of input output redirection have already been described. In the following word is only subject to parameter and command substitution. No file name generation or blank interpretation takes place so that, for example, echo ... >*.c will write its output into a file whose name is *.c. Input output specifications are evaluated left to right as they appear in the command. > word The standard output (file descriptor 1) is sent to the file word which is created if it does not already exist. >> word The standard output is sent to file word. If the file exists then output is appended (by seeking to the end); otherwise the file is created. < word The standard input (file descriptor 0) is taken from the file word. << word The standard input is taken from the lines of shell input that follow up to but not including a line consisting only of word. If word is quoted then no interpretation of the document occurs. If word is not quoted then parameter and command substitution occur and \ is used to quote the characters \ $ ` and the first character of word. In the latter case \newline is ignored (c.f. quoted strings). >& digit The file descriptor digit is duplicated using the system call dup (2) and the result is used as the standard output. <& digit The standard input is duplicated from file descriptor digit. <&The standard input is closed. >&The standard output is closed. 170 Any of the above may be preceded by a digit in which case the file descriptor created is that specified by the digit instead of the default 0 or 1. For example, ... 2>file runs a command with message output (file descriptor 2) directed to file. ... 2<&1 runs a command with its standard output and message output merged. (Strictly speaking file descriptor 2 is created by duplicating file descriptor 1 but the effect is usually to merge the two streams.) The environment for a command run in the background such as list *.c | lpr & is modified in two ways. Firstly, the default standard input for such a command is the empty file /dev/null. This prevents two processes (the shell and the command), which are running in parallel, from trying to read the same input. Chaos would ensue if this were not the case. For example, ed file & would allow both the editor and the shell to read from the same input at the same time. The other modification to the environment of a background command is to turn off the QUIT and INTERRUPT signals so that they are ignored by the command. This allows these signals to be used at the terminal without causing background commands to terminate. For this reason the UNIX convention for a signal is that if it is set to 1 (ignored) then it is never changed even for a short time. Note that the shell command trap has no effect for an ignored signal. 18.17.10 Invoking the shell The following flags are interpreted by the shell when it is invoked. If the first character of argument zero is a minus, then commands are read from the file .profile. -c string If the -c flag is present then commands are read from string. -s If the -s flag is present or if no arguments remain then commands are read from the standard input. Shell output is written to file descriptor 2. -i If the -i flag is present or if the shell input and output are attached to a terminal (as told by gtty) then this shell is interactive. In this case TERMINATE is ignored (so that 171 kill 0 does not kill an interactive shell) and INTERRUPT is caught and ignored (so that wait is interruptable). In all cases QUIT is ignored by the shell. 18.18 Let us sum up In this lesson we have learnt about c) the Kernel d) and shell commands 18.19 Points for Discussion a) Discuss about the importance of Unix b) Differentiate between kernel and shell 18.20 Model Answers to Check your Progress In order to check your progress, try to answer the following o What is a shell variable o What is the use of MAN command o Describe about various control flows in UNIX shell 18.21 Lesson - end activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. d) Discuss about Kernel e) Discuss about any 10 shell commands 18.22 References H.M. Deitel, Chapter 18 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 7 of “Modern Operating Systems”, PHI, 1996 172 LESSON – 19: PROCESS, MEMORY AND I/O OF UNIX CONTENTS 19.1 Aims and Objectives 19.2 Introduction to process management 19.2.1 ps 19.2.2 Runaway Processes 19.2.3 Killing Processes 19.2.4 Top 19.2.5 nice and renice 19.2.6 Job Control 19.3 Memory Management 19.3.1 Kinds of Memory 19.3.2 OS Memory Uses 19.3.3 Process Memory Uses 19.4 The Input/output system 19.4.1 Descriptors and I/O 19.4.2 Descriptor Management 19.4.3 Devices 19.4.4 Socket IPC 19.4.5 Scatter/Gather I/O 19.5 Let us sum up 19.6 Lesson - end activities 19.7 Points for Discussion 19.8 Model Answers to Check your Progress 19.9 References 19.1 Aims and Objectives In this lesson we will learn about the introduction to process, memory and i/o of UNIX. The objectives of this lesson is to make the candidate aware of the following o) Process concept p) Memory concept q) And Input/output of Unix 173 19.2 Introduction to Process Management Any time you use a command on a Unix system, that command runs as a process on the system. Once the command runs to completion, or is otherwise interrupted, the process should terminate. Most commands do this flawlessly, however sometimes odd things will happen. This document will attempt to help you understand how processes work and how to manage your own processes. 19.2.1 ps The first topic to cover is how to get information on processes. There several ways to do this and this is not meant to be an exhaustive listing of methods to use. To find out which processes are running under your username at this moment, use the command ps -u username replacing the word "username" with your own username. The output from ps shows four columns by default: PID, TTY, TIME, and CMD. PID is short for process ID, a number which uniquely identifies the process on the system. Whenever a process is started, it is given a PID. You can use this PID to interact with the process at a later time. The TTY field shows the terminal associated with the process. You do not generally need to worry about the value of this field. The TIME field displays the total amount of CPU time the process has used, not the time since the process was started. This will typically be quite small, unless your process is very CPU intensive. The final field is CMD, which shows the name of the command associated with that process. If you are using Borg or Locutus, you will notice that the bash process will be running any time you are logged in. This is the shell that you are currently using to run commands. So long as bash is set as your default shell, it will be there when you are logged in. If you get a friend to run this command from their account with your username while you are not logged in, nothing should show up at all. Please note that this is not the only way to use ps and for more information you should type man ps on the command line for a detailed listing of the different options you can use. 19.2.2 Runaway Processes So what happens if you are not logged in and there are still processes running? This can mean one of several things, but more than likely you have set a process to run in the background by adding a & at the end of the command (more on this later) and did not close the program or kill the process. Another possibility is that you have a program that that is runaway. This means that the process should have completed properly and you have taken all the correct steps, but for any number of reasons, it did not terminate properly. 174 19.2.3 Killing Processes In order to kill a process that has run away or that you have otherwise lost control of, you should use the kill command. There are two popular ways of using kill: kill pid where "pid" is the process ID number listed under the PID column when you use ps. This is a graceful way to kill your process by having the operating system send the process a signal that tells it to stop what it's doing and terminate. Sometimes this doesn't work and you have to use a more aggressive version of kill. kill -9 pid will get the operating system to kill the process immediately. This can be dangerous if the process has control of a lock mechanisms, but your process will stop. If you find yourself completely stuck, terminals have frozen upon you and you just don't know what to do, you can log in again and use kill -9 -1. This command will kill all of your processes including your current log in. The next time you log in, everything that was running or was locked up will have terminated. Please note that using this command as root will bring down a system quick, fast and in a hurry so use this command only as a user, never as root. This may also result in lost work, so use it only as a last resort. 19.2.4 Top Another very useful command is the top command. This will allow you to see the processes that are using the most CPU time, memory, etc... If you find that one of your processes is running near the top of the listing offered by top, it means that you are taking up more CPU time than most people on the system. You will see a column that has %CPU as its heading. If your process is taking more than 5% of the CPU on Borg or Locutus for any extended period of time, you should investigate to see what it's doing. It is quite possible that you are using a program that does require this much CPU time. This is perfectly alright and will not pose a problem. If, however, you find that the program isn't doing anything or should already have completed you should consider killing it. 19.2.5 nice and renice If you are about to run some program that you know will consume a significant amount of resources on the servers, please send an e-mail to cshelp first to warn our technical support team. If possible, you should first consider running such a process on workstations rather than on the server. You can log in and run any process on our Sun workstations just as well as on our servers. If you must run a job that requires a lot of CPU time, you should nice the process. The nice and renice commands change the priority at which a process runs. The nice is used to start a process with a different priority, and renice changes the priority of a running process. To start a process with nice, run nice -n nicevalue command arguments where nicevalue is a number between 1 and 19, and command and arguments are the command you want to run and its arguments. The larger the number given to nice, the lower the priority command is given. Note that only the root user can increase priority with nice. If you have already begun to run a process which is using a lot of CPU time, you can also re-nice that process. renice acts the same way as nice; except it is used on running processes. To re-nice a process, simply type renice -n nicevalue -p pid. Here, nicevalue is the new priority, and pid is the process ID of the process you want to re-nice. You can also renice all your processes, useful if your feeling guilty about the amount of CPU time your processes are using. To do this, simply use renice -n nicevalue -u username. Note that you 175 can never increase your priority with renice, nor can you use it on processes owned by other users. 19.2.6 Job Control Many shells, including the default bash shell, support what is known as job control. This allows you to run several processes, at the same time, from the same terminal. You can run processes that may take a long time to complete in the background, freeing the terminal for other use. You can suspend running processes, move them to the background, put background processes back into the foreground, and list all your currently running jobs. Running processes in the background is important for processes that take a long time to complete, but don't require any user interaction from the shell. If you start a process such as netscape from the command line, it is a good idea to place it in the background so the shell is not tied up. All the output from the process is still visible, unless it is specifically disabled. To run a process in the background, simply add a & character to the end of the command line. The command line prompt has returned immediately, and it has provided you with some important information regarding the background process. The number inside the square brackets (here it is 1) indicates the job number of the background process. This number is used by other job control commands to identify this process. The second number (here it is 5044) is the process ID of the process. The difference between the two numbers is the job number is unique to the shell, while the process ID is unique to the operating system. If you run a program normally, but then decide you would like to run it in the background, you must first suspend the process by hitting Ctrl-Z. This suspends execution of the process, and gives you a command line. To move the suspended process to the background, type bg. The process then resumes execution, exactly as if you had typed & when starting the process. To later bring a background process back into the foreground, use the fg command. This command takes an argument specifying which background process to place in the foreground. This is the job number displayed when the process was placed in the background. If you cannot remember the job number of the process you wish to bring back to the foreground, the jobs command will list all the jobs you are currently running and their numbers. This is similar to a small version of ps, but is not quite the same: the jobs command only lists processes started from that particular shell. 19.3 Memory Management Unlike traditional PC operating systems, Unix related systems use very sophisticated memory management algorithms to make efficient use of memory resources. This makes the questions "How much memory do I have?" and "How much memory is being used?" rather complicated to answer. First you must realize that there are three different kinds of memory, three different ways they can be used by the operating system, and three different ways they can be used by processes. 19.3.1 Kinds of Memory 176 Main - The physical Random Access Memory located on the CPU motherboard that most people think of when they talk about RAM. Also called Real Memory. This does not include processor caches, video memory, or other peripheral memory. File System - Disk memory accessible via pathnames. This does not include raw devices, tape drives, swap space, or other storage not addressable via normal pathnames. It does include all network file systems. Swap Space - Disk memory used to hold data that is not in Real or File System memory. Swap space is most efficient when it is on a separate disk or partition, but sometimes it is just a large file in the File System. 19.3.2 OS Memory Uses Kernel - The Operating System's own (semi-)private memory space. This is always in Main memory. Cache - Main memory that is used to hold elements of the File System and other I/O operations. Not to be confused with the CPU cache or or disk drive cache, which are not part of main memory. Virtual - The total addressable memory space of all processes running on the given machine. The physical location of such data may be spread among any of the three kinds of memory. 19.3.3 Process Memory Uses Data - Memory allocated and used by the program (usually via malloc, new, or similar runtime calls). Stack - The program's execution stack (managed by the OS). Mapped - File contents addressable within the process memory space. The amount of memory available for processes is at least the size of Swap, minus Kernel. On more modern systems (since around 1994) it is at least Main plus Swap minus Kernel and may also include any files via mapping. 19.4 The Input/output system The basic model of the UNIX I/O system is a sequence of bytes that can be accessed either randomly or sequentially. There are no access methods and no control blocks in a typical UNIX user process. Different programs expect various levels of structure, but the kernel does not impose structure on I/O. For instance, the convention for text files is lines of ASCII characters separated by a single newline character (the ASCII line-feed character), but the kernel knows nothing about this convention. For the purposes of most programs, the model is further simplified to being a stream of data bytes, or an I/O stream. It is this single common data form that makes the characteristic UNIX tool-based approach work Kernighan & Pike, 1984. An I/O stream from one program can be fed as input to almost any other program. (This kind of traditional UNIX I/O stream should not be confused with the Eighth Edition stream I/O system or with the System V, Release 3 STREAMS, both of which can be accessed as traditional I/O streams.) 177 19.4.1 Descriptors and I/O UNIX processes use descriptors to reference I/O streams. Descriptors are small unsigned integers obtained from the open and socket system calls. The open system call takes as arguments the name of a file and a permission mode to specify whether the file should be open for reading or for writing, or for both. This system call also can be used to create a new, empty file. A read or write system call can be applied to a descriptor to transfer data. The close system call can be used to deallocate any descriptor. Descriptors represent underlying objects supported by the kernel, and are created by system calls specific to the type of object. In 4.4BSD, three kinds of objects can be represented by descriptors: files, pipes, and sockets. A file is a linear array of bytes with at least one name. A file exists until all its names are deleted explicitly and no process holds a descriptor for it. A process acquires a descriptor for a file by opening that file's name with the open system call. I/O devices are accessed as files. A pipe is a linear array of bytes, as is a file, but it is used solely as an I/O stream, and it is unidirectional. It also has no name, and thus cannot be opened with open. Instead, it is created by the pipe system call, which returns two descriptors, one of which accepts input that is sent to the other descriptor reliably, without duplication, and in order. The system also supports a named pipe or FIFO. A FIFO has properties identical to a pipe, except that it appears in the filesystem; thus, it can be opened using the open system call. Two processes that wish to communicate each open the FIFO: One opens it for reading, the other for writing. A socket is a transient object that is used for interprocess communication; it exists only as long as some process holds a descriptor referring to it. A socket is created by the socket system call, which returns a descriptor for it. There are different kinds of sockets that support various communication semantics, such as reliable delivery of data, preservation of message ordering, and preservation of message boundaries. In systems before 4.2BSD, pipes were implemented using the filesystem; when sockets were introduced in 4.2BSD, pipes were reimplemented as sockets. The kernel keeps for each process a descriptor table, which is a table that the kernel uses to translate the external representation of a descriptor into an internal representation. (The descriptor is merely an index into this table.) The descriptor table of a process is inherited from that process's parent, and thus access to the objects to which the descriptors refer also is inherited. The main ways that a process can obtain a descriptor are by opening or creation of an object, and by inheritance from the parent process. In addition, socket IPC allows passing of descriptors in messages between unrelated processes on the same machine. Every valid descriptor has an associated file offset in bytes from the beginning of the object. Read and write operations start at this offset, which is updated after each data transfer. For objects that permit random access, the file offset also may be set with the lseek system call. Ordinary files permit random access, and some devices do, as well. Pipes and sockets do not. When a process terminates, the kernel reclaims all the descriptors that were in use by that process. If the process was holding the final reference to an object, the object's manager 178 is notified so that it can do any necessary cleanup actions, such as final deletion of a file or deallocation of a socket. 19.4.2 Descriptor Management Most processes expect three descriptors to be open already when they start running. These descriptors are 0, 1, 2, more commonly known as standard input, standard output, and standard error, respectively. Usually, all three are associated with the user's terminal by the login process (see Section 14.6) and are inherited through fork and exec by processes run by the user. Thus, a program can read what the user types by reading standard input, and the program can send output to the user's screen by writing to standard output. The standard error descriptor also is open for writing and is used for error output, whereas standard output is used for ordinary output. These (and other) descriptors can be mapped to objects other than the terminal; such mapping is called I/O redirection, and all the standard shells permit users to do it. The shell can direct the output of a program to a file by closing descriptor 1 (standard output) and opening the desired output file to produce a new descriptor 1. It can similarly redirect standard input to come from a file by closing descriptor 0 and opening the file. Pipes allow the output of one program to be input to another program without rewriting or even relinking of either program. Instead of descriptor 1 (standard output) of the source program being set up to write to the terminal, it is set up to be the input descriptor of a pipe. Similarly, descriptor 0 (standard input) of the sink program is set up to reference the output of the pipe, instead of the terminal keyboard. The resulting set of two processes and the connecting pipe is known as a pipeline. Pipelines can be arbitrarily long series of processes connected by pipes. The open, pipe, and socket system calls produce new descriptors with the lowest unused number usable for a descriptor. For pipelines to work, some mechanism must be provided to map such descriptors into 0 and 1. The dup system call creates a copy of a descriptor that points to the same file-table entry. The new descriptor is also the lowest unused one, but if the desired descriptor is closed first, dup can be used to do the desired mapping. Care is required, however: If descriptor 1 is desired, and descriptor 0 happens also to have been closed, descriptor 0 will be the result. To avoid this problem, the system provides the dup2 system call; it is like dup, but it takes an additional argument specifying the number of the desired descriptor (if the desired descriptor was already open, dup2 closes it before reusing it). 19.4.3 Devices Hardware devices have filenames, and may be accessed by the user via the same system calls used for regular files. The kernel can distinguish a device special file or special file, and can determine to what device it refers, but most processes do not need to make this determination. Terminals, printers, and tape drives are all accessed as though they were streams of bytes, like 4.4BSD disk files. Thus, device dependencies and peculiarities are kept in the kernel as much as possible, and even in the kernel most of them are segregated in the device drivers. 179 Hardware devices can be categorized as either structured or unstructured; they are known as block or character devices, respectively. Processes typically access devices through special files in the filesystem. I/O operations to these files are handled by kernel-resident software modules termed device drivers. Most network-communication hardware devices are accessible through only the interprocess-communication facilities, and do not have special files in the filesystem name space, because the raw-socket interface provides a more natural interface than does a special file. Structured or block devices are typified by disks and magnetic tapes, and include most random-access devices. The kernel supports read-modify-write-type buffering actions on block-oriented structured devices to allow the latter to be read and written in a totally random byte-addressed fashion, like regular files. Filesystems are created on block devices. Unstructured devices are those devices that do not support a block structure. Familiar unstructured devices are communication lines, raster plotters, and unbuffered magnetic tapes and disks. Unstructured devices typically support large block I/O transfers. Unstructured files are called character devices because the first of these to be implemented were terminal device drivers. The kernel interface to the driver for these devices proved convenient for other devices that were not block structured. Device special files are created by the mknod system call. There is an additional system call, ioctl, for manipulating the underlying device parameters of special files. The operations that can be done differ for each device. This system call allows the special characteristics of devices to be accessed, rather than overloading the semantics of other system calls. For example, there is an ioctl on a tape drive to write an end-of-tape mark, instead of there being a special or modified version of write. 19.4.4 Socket IPC The 4.2BSD kernel introduced an IPC mechanism more flexible than pipes, based on sockets. A socket is an endpoint of communication referred to by a descriptor, just like a file or a pipe. Two processes can each create a socket, and then connect those two endpoints to produce a reliable byte stream. Once connected, the descriptors for the sockets can be read or written by processes, just as the latter would do with a pipe. The transparency of sockets allows the kernel to redirect the output of one process to the input of another process residing on another machine. A major difference between pipes and sockets is that pipes require a common parent process to set up the communications channel. A connection between sockets can be set up by two unrelated processes, possibly residing on different machines. System V provides local interprocess communication through FIFOs (also known as named pipes). FIFOs appear as an object in the filesystem that unrelated processes can open and send data through in the same way as they would communicate through a pipe. Thus, FIFOs do not require a common parent to set them up; they can be connected after a pair of processes are up and running. Unlike sockets, FIFOs can be used on only a local machine; they cannot be used to communicate between processes on different machines. FIFOs are implemented in 4.4BSD only because they are required by the POSIX.1 standard. Their functionality is a subset of the socket interface. The socket mechanism requires extensions to the traditional UNIX I/O system calls to provide the associated naming and connection semantics. Rather than overloading the 180 existing interface, the developers used the existing interfaces to the extent that the latter worked without being changed, and designed new interfaces to handle the added semantics. The read and write system calls were used for byte-stream type connections, but six new system calls were added to allow sending and receiving addressed messages such as network datagrams. The system calls for writing messages include send, sendto, and sendmsg. The system calls for reading messages include recv, recvfrom, and recvmsg. In retrospect, the first two in each class are special cases of the others; recvfrom and sendto probably should have been added as library interfaces to recvmsg and sendmsg, respectively. 19.4.5 Scatter/Gather I/O In addition to the traditional read and write system calls, 4.2BSD introduced the ability to do scatter/gather I/O. Scatter input uses the readv system call to allow a single read to be placed in several different buffers. Conversely, the writev system call allows several different buffers to be written in a single atomic write. Instead of passing a single buffer and length parameter, as is done with read and write, the process passes in a pointer to an array of buffers and lengths, along with a count describing the size of the array. This facility allows buffers in different parts of a process address space to be written atomically, without the need to copy them to a single contiguous buffer. Atomic writes are necessary in the case where the underlying abstraction is record based, such as tape drives that output a tape block on each write request. It is also convenient to be able to read a single request into several different buffers (such as a record header into one place and the data into another). Although an application can simulate the ability to scatter data by reading the data into a large buffer and then copying the pieces to their intended destinations, the cost of memory-to-memory copying in such cases often would more than double the running time of the affected application. Just as send and recv could have been implemented as library interfaces to sendto and recvfrom, it also would have been possible to simulate read with readv and write with writev. However, read and write are used so much more frequently that the added cost of simulating them would not have been worthwhile. 181 19.5 Let us sum up In this lesson we have learnt about a) the UNIX process management b) the Memory Management in UNIX c) and input/output in UNIX 19.6 Points for Discussion Discuss about Input/output in UNIX 19.7 Model Answers to Check your Progress To check your progress try to answer the following questions a) Socket IPC b) Descriptor management 19.8 Lesson - end activities After learning this chapter, try to discuss among your friends and answer these questions to check your progress. b) Discuss about process and memory management in UNIX c) Discuss about Memory Management in UNIX 19.9 References H.M. Deitel, Chapter 18 of “Operating Systems”, Second Edition, Pearson Education, 2001 Andrew S. Tanenbaum, Chapter 7 of “Modern Operating Systems”, PHI, 1996 182