Operating System

advertisement
UNIT - I
LESSON – 1: INTRODUCTION TO DOS AND UNIX
CONTENTS
1.1 Aims and Objectives
1.2 Introduction
1.3 History of DOS
1.3.1
Definition of DOS
1.3.2
History and versions
1.3.3
DOS Environment
1.4 History of Unix
1.5 Let us sum up
1.6 Points for Discussion
1.7 Model answers to “Check your Progress”
1.8 Lesson - end activities
1.9 References
1.1 Aims and Objectives
Computer software can be divided into two main categories: application software and
system software. Application software consists of the programs for performing tasks
particular to the machine's utilization. Examples of application software include spreadsheets,
database systems, desktop publishing systems, program development software, and games."
The most important type of system software is the operating system. An operating system
has three main responsibilities: i) Perform basic tasks, such as recognizing input from the
keyboard, sending output to the display screen, keeping track of files and directories on the
disk, and controlling peripheral devices such as disk drives and printers. ii) Ensure that
different programs and users running at the same time do not interfere with each other. iii)
Provide a software platform on top of which other programs (i.e., application software) can
run.
In this lesson we will learn the basics, history, versions and environment of DOS. This
lesson also covers the basic definitions and history of UNIX.
The objectives of this lesson is to make the student aware of the basic concepts and
history of both DOS and UNIX
1
1.2 Introduction
The functions and working of operating system can be better understood by the
beginners, if we start with an example which is analogous with the operating system. Let us
take an example of an office environment as follows.
Consider the office in which you work. There is likely to be a workspace (desk), storage
space (filing cabinets, drawers, etc.), and tools (pencils, pens, rulers, calculators, etc.). The
layout of the room and the laws of physics (thank you, Newton) dictate how we can
accomplish tasks within this office. If we want to work on a file, we must go to a drawer, pull
on the drawer to open it, open the folder where the document is, grab the document, go to the
desk, then you get the idea.
On a computer, how you work is defined not by the laws of physics, but by the rules
dictated by your computer's hardware and software. The software aspect of those rules is
what we call an operating system. So in effect, the programmer (actually a series of teams of
many programmers and designers) has created a new environment for working. In order to do
something in that environment, you must follow the rules of this new environment. In the
physical world, we must follow Newton's laws in order to accomplish any task. Likewise on a
computer, we must follow the programmer's laws; the only difference is that Newton did not
decide what these rules would be!
The basic functions of an operating system are as follows:



File management--analogous to the file cabinet and its use
Working with the Files--analogous to picking up and preparing to use a calculator or
some other tool
Configuration of your working environment--analogous to shifting your desk around
to suit you better
1.3 History of DOS
1.3.1 Definition of DOS
DOS (an acronym for Disk Operation System) is a tool which allows you to control
the operation of the Personal Computer. DOS is a software, which was written to control
hardware. DOS can be used for a wide range of tasks. You will be able to manage well if
you master only a small subset of DOS commands and functions. The environment provided
by DOS is to give the user quick and direct access to the basic utility of the computer. All
tasks are accomplished by typing in commands at a command prompt (at the cursor).
The operating system offers a hardware abstraction layer that allows development of
character-based applications, but not for accessing most of the hardware, such as graphics
cards, printers, or mice. This required programmers to access the hardware directly, resulting
in each application having its own set of device drivers for each hardware peripheral.
Hardware manufacturers would release specifications to ensure device drivers for popular
applications were available.
2
1.3.2 History and Versions
A full time operating system was needed for IBM's 8086 line of computers, but
negotiations for the use of CP/M on these broke down. IBM approached Microsoft's CEO,
Bill Gates, who purchased QDOS from SCP allegedly for $50,000. This became Microsoft
Disk Operating System, MS-DOS. Microsoft also licensed their system to multiple computer
companies, who used their own names. Eventually, Microsoft would require the use of the
MS -DOS name, with the exception of the IBM variant, which would continue to be
developed concurrently and sold as PC-DOS (this was for IBM's new 'PC' using the 8088
CPU (internally the same as the 8086)).
Early versions of Microsoft Windows were little more than a graphical shell for DOS,
and later versions of Windows were tightly integrated with MS-DOS. It is also possible to run
DOS programs under OS/2 and Linux using virtual-machine emulators. Because of the long
existence and ubiquity of DOS in the world of the PC-compatible platform, DOS was often
considered to be the native operating system of the PC compatible platform.
There are alternative versions of DOS, such as FreeDOS and OpenDOS. FreeDOS
appeared in 1994 due to Microsoft Windows 95, which differed from Windows 3.11
MS-DOS (and the IBM PC-DOS which was licensed therefrom), and its predecessor,
86-DOS, was inspired by CP/M (Control Program / (for) Microcomputers) — which was the
dominant disk operating system for 8-bit Intel 8080 and Zilog Z80 based microcomputers.
Tim Paterson at Seattle Computer Products developed a variant of CP/M-80, intended as an
internal product for testing the SCP's new 8086 CPU card for the S-100 bus. It did not run on
the 8080 CPU needed for CP/M-80. The system was named 86-DOS (it had initially been
called QDOS, which stood for Quick and Dirty Operating System).
Digital Research would attempt to regain the market with DR-DOS, an MS-DOS and
CP/M hybrid. Digital Research would later be bought out by Novell, and DR DOS became
Novell DOS 7. DR DOS would later be part of Caldera (as OpenDOS), Lineo (as DR DOS),
and DeviceLogics.
1.3.3 DOS Working Environment
This subsection will give you a general understanding about the command prompt,
directory, Directory, Working with the files, File naming conventions, Viewing, Editing,
Executing, Stop Execution, Printing, Backup files, and Rebooting
a) Command Prompt
If we take a look at the computer screen, we are likely to see a blank screen with the
exception of a few lines, at least one of which begins with a capital letter followed by a colon
and a backslash and ends with a greater-than symbol (>):
C:\>
Any line in DOS that begins like this is a command prompt. This line prompt is the
main way users know where they are in DOS. Here is how:
3
 The C: tells the user that he/she is working within the filespace (disk storage) on the
hard drive given the designation C. C is usually reserved for the internal hard disk of a
PC.
 The backslash (\) represents a level in the hierarchy of the file structure. There is
always at least one because it represents the root directory, the very first level of your
hard disk.
In a graphical representation of the file structure below, you can see how a file can be stored
in different levels on a hard disk.
Folder icons represent a directory, and In DOS, the same file, SAMPLE, is
the document icons represent actual
represented this way:
files in those directories.
C:\DEMO\DOS&WIN\SAMPLES\SAMPLE
or, to mimic a graphic representation,
C:\
DEMO\
DOS&WIN\
SAMPLES\
SAMPLE
So what C:\DEMO\DOS&WIN\SAMPLES\SAMPLE means is that the file SAMPLE
is on the internal hard disk, four levels deep (inside several nested directories). The list of
directories (\DEMO\DOS&WIN\SAMPLES\) is referred to as a pathname (following the
path of directories will get you to the file). The name of the file itself (SAMPLE) is referred
to as the filename.
b) Directory
If you need more help in orienting yourself, it sometimes helps to take a look at the
files and directories available where you are by using the DIR command.
C:\>dir
This will give you a listing of all the files and directories contained in the current
directory in addition to some information about the directory itself. You will see the word
volume in this information. Volume is simply another word for a disk that the computer has
access to. Your hard disk is a volume, your floppy disk is a volume, a server disk (hard disk
served over a network) is a volume. Now you know fancy words for all the parts of the
format DOS uses to represent a file.
Volume: C:
Pathname: \DEMO\DOS&WIN\SAMPLES\
Filename: SAMPLE
Here are some helpful extensions of the DIR command:
4
 C:\>dir | more
(will display the directory one screen at a time with a < more> prompt--use control-C
to escape)
 C:\>dir /w
(wide: will display the directory in columns across the screen)
 C:\>dir /a
(all: will display the directory including hidden files and directories)
c) Working with the files
Understanding how to manage your files on your disk is not the same as being able to
use them. In DOS (and most operating systems), there are only two kinds of files, binary files
and text files (ASCII). Text files are basic files that can be read or viewed very easily. Binary
files, on the other hand, are not easily viewed. As a matter of fact, most binary files are not to
be viewed but to be executed. When you try to view these binary files (such as with a text
editor), your screen is filled with garbage and you may even hear beeps.
While there are only two kinds of files, it is often difficult to know which kind a
particular file is. For files can have any extension! Fortunately, there is a small set of
extensions that have standard meanings like .txt, .bat, and .dat for text files and .exe and .com
for binary executables.
d) File naming conventions
Careful file naming can save time. Always choose names which provide a clue to the
file's contents. If you are working with a series of related files, use a number somewhere in
the name to indicate which version you have created. This applies only to the filename
parameter; most of the file extension parameters you will be using are predetermined or
reserved by DOS for certain types of file. For example, data1.dat, employee.dat
e) Editing
You can view any text file using text editor. For example to open a file named
‘employee.txt’ in the ‘work’ directory of the ‘C’ drive
C:\work> edit employee.txt
f) Executing
Binary files ending in .exe are usually "executed" by typing the filename as if it were
a command. The following command would execute the WordPerfect application which
appears on the disk directory as WP.EXE:
C:\APPS\WP51>wp
Binary files ending in .com often contain one or more commands for execution either
through the command prompt or through some program.
g) Stop Execution
5
If you wish to stop the computer in the midst of executing the current command, you
may use the key sequence Ctrl-Break. Ctrl-Break does not always work with non-DOS
commands. Some software packages block its action in certain situations, but it is worth
trying before you re-boot.
h) Rebooting
In some cases, when all attempts to recover from a barrage of error messages fails, as
a last resort you can reboot the computer. To do this, you press, all at once, the control,
alternate and delete (CTRL+ALT+DELELTE). If you re-boot, you may loose some of your
work and any data active in RAM which has not yet been saved to disk.
1.4
History of Unix
In the 1960s, the Massachusetts Institute of Technology, AT&T Bell Labs, and General
Electric worked on an experimental operating system called Multics (Multiplexed
Information and Computing Service), which was designed to run on the GE-645 mainframe
computer. The aim was the creation of a commercial product, although this was never a great
success. Multics was an interactive operating system with many novel capabilities, including
enhanced security. The project did develop production releases, but initially these releases
performed poorly.
AT&T Bell Labs pulled out and deployed its resources elsewhere. One of the
developers on the Bell Labs team, Ken Thompson, continued to develop for the GE-645
mainframe, and wrote a game for that computer called Space Travel. However, he found that
the game was too slow on the GE machine and was expensive, costing $75 per execution in
scarce computing time.
Thompson thus re-wrote the game in assembly language for Digital Equipment
Corporation's PDP-7 with help from Dennis Ritchie. This experience, combined with his
work on the Multics project, led Thompson to start a new operating system for the PDP-7.
Thompson and Ritchie led a team of developers, including Rudd Canady, at Bell Labs
developing a file system as well as the new multi-tasking operating system itself. They
included a command line interpreter and some small utility programs. These developments
later emerged as a fully functional UNIX operating system. A short history of the
development process is given the following table.
Year
1970s
1973s
1975s
1978s
1979s
1980s
Innovations
Unics - support two simultaneous users
Multics – invented by Brian Kernighan – later changed to Unix – works on PDP-7 machine and
later on PDP – 11/20
Written in Assembly language
UNIX – rewritten in C language – more concise and compact
Release of Version 5,6 and 7 - added the concept of pipes
Release of PWB/UNIX, IS/1 (the first commercial Unix)
Release of Interdata 7/32 (the first non-PDP Unix)
Release of UNIX/32V, for the VAX system
Development of Unix versions 8,9 & 10
Development of X windows system – a graphical user interface for Unix
Release of Unix System V
BSD Unix by Berkeley researchers – contains TCP/IP network code – Accompanying Berkeley
Sockets API is a de facto standard for networking APIs
6
1982s
1984s
1987-89
1990s
1991s
1993
2000
present
to
Release of Xenix – Microsoft’s first Unix for 16-bit microcomputers
SunOS (now Solaris) by Sun Microsystems
The most successful Unix-related standard turned out to be the IEEE's POSIX specification,
designed as a compromise API readily implemented on both BSD and System V platforms,
published in 1988 and soon mandated by the United States government for many of its own
systems.
SCO Unix – developed from Xenix for Intel 8086
AT&T added various features into UNIX System V, such as file locking, system administration,
streams, new forms of IPC, the Remote File System and TLI. AT&T cooperated with Sun
Microsystems and between 1987 and 1989 merged features from Xenix, BSD, SunOS, and
System V into System V Release 4 (SVR4), independently of X/Open.
The Common Desktop Environment or CDE, a graphical desktop for UNIX co-developed by HP,
IBM, and Sun as part of the COSE initiative.
Open Software Foundation released OSF/1, their standard UNIX implementation, based on Mach
and BSD.
Free distribution of FreeBSD, OpenBSD, and NetBSD
Novell developed its own version, UnixWare, merging its NetWare with UNIX System V
Release 4
In 2005, Sun Microsystems released open source project called OpenSolaris(based on UNIX
System V Release 4)
Release of other distributions like SchilliX, Belenix, Nexenta, MarTux, SGI’s IRIX.
Only Solaris, HP-UX, and AIX are still doing relatively well in the market.
1.5 Let Us Sum Up
In this lesson we have learned about
a) the development history of DOS
b) the various version of DOS
c) the working environment of DOS
d) History and development of UNIX
1.6 Points for Discussion
a) Discuss about the various functionalities of operating system
b) Discuss about the future of DOS
1.7 Model answers to “Check your Progress”
An operating system (OS) is the software that manages the sharing of the resources of a
computer and provides programmers with an interface used to access those resources. An
operating system processes system data and user input, and responds by allocating and
managing tasks and internal system resources as a service to users and programs of the
system. At the foundation of all system software, an operating system performs basic tasks
such as controlling and allocating memory, prioritizing system requests, controlling input and
output devices, facilitating networking and managing file systems. Most operating systems
come with an application that provides a user interface for managing the operating system,
such as a command line interpreter or graphical user interface. The operating system forms a
platform for other system software and for application software.
MS-DOS has effectively ceased to exist as a platform for desktop computing. Since the
releases of Windows 9x, it was integrated as a full product mostly used for bootstrapping, and
no longer officially released as a standalone DOS. It was still available, but became
increasingly irrelevant as development shifted to the Windows API. Windows XP contains a
7
copy of the core MS-DOS 8 files from Windows Millennium, accessible only by formatting a
floppy as an "MS-DOS startup disk". Attempting to run COMMAND.COM from such a disk under
the NTVDM results in the message "Incorrect MS-DOS version". With Windows Vista the
files on the startup disk are dated 18th April 2005 but are otherwise unchanged, including the
string "MS-DOS Version 8 (C) Copyright 1981-1999 Microsoft Corp" inside COMMAND.COM.
Today, DOS is still used in embedded x86 systems due to its simple architecture, and
minimal memory and processor requirements. The command line interpreter of Windows NT,
cmd.exe maintains most of the same commands and some compatibility with DOS batch
files.
1.7 Lesson-end Activities
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
a) Discuss about DOS operating system and its Environment
b) Discuss about the UNIX Operating system and about the version of UNIX
1.8 References
a) Charles Crowley, Chapter 1 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
b) H.M. Deitel, Chapter 1, 18, 19 of “Operating Systems”, Second Edition, Pearson
Education, 2001
c) Andrew S. Tanenbaum, Chapter 1, 7, 8 of “Modern Operating Systems”, PHI,
1996
d) D.M. Dhamdhere, Chapter 9 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
8
9
LESSON – 2: INTRODUCTION TO PROCESS
CONTENTS
2.1 Aims and Objectives
2.2 Introduction to process
2.3 Process states
2.4 Process state transitions
2.5 Operations on process
2.5.1 Suspend and resume
2.5.2 Suspending a process
2.6 Let us sum up
2.7 Points for Discussion
2.8 Model answers to “Check your Progress”
2.9 Lesson end activities
2.10 References
2.1 Aims and Objectives
In this lesson we will learn about the introduction of process, various states of the
process, and process transitions. The objectives of this lesson are to make the student aware
of the basic concepts process and its behaviors in an operating system.
2.2 Introduction to Process
Process is defined as a program in execution and is the unit of work in a modern timesharing system. Such a system consists of a collection of processes: Operating-system
processes executing system code and user processes executing user code. All these processes
can potentially execute concurrently, with the CPU (or CPUs) multiplexed among them. By
switching the CPU between processes, the operating system can make the computer more
productive.
A process is more than the program code, it includes the program counter, the process
stack, and the contents of process register etc. The purpose of process stack is to store
temporary data, such as subroutine parameters, return address and temporary variables. All
these information will be stored in Process Control Block (PCB). The Process control block
is a record containing many pieces of information associated with a process including process
state, program counter, cpu registers, memory management information, accounting
information, I/O status information, cpu scheduling information, memory limits, and list of
open files.
10
2.3 Process States
When a process executes, it changes the state, generally the state of process is
determine by the current activity of the process. Each process may be in one of the following
states.
New-------------: The process is being created.
Running--------: The process is being executed
Waiting---------: The process is waiting for some event to occur.
Ready-----------: The process is waiting to be assigned to a processor.
Terminate------: The process has finished execution.
The important thing is only one process can be running in any processor at any time.
But many processes may be in ready and waiting states. The ready processes are loaded in to
a “ready queue”. A queue is one type of data structure. It is used here to store process.
The operating system creates a process and prepares the process to be executed then
the operating systems moved the process in to ready queue. When it is time to select a
process to run, the operating system selects one of the jobs from the ready queue and moves
the process from ready state to running state. When the execution of a process has completed
then the operating system terminates that process from running state. Some times operating
system terminates the process for some other reasons also which include time limit exceeded,
memory unavailable, access violation, protection error, I/O failure, data misuse and so on.
When the time slot of the processor expires or if the processor receives any interrupt
signal, then the operating system shifts running process to ready state. For example, let
process P1 be executing in the processor and in the mean time let process P2 generates an
interrupt signal to the processor. The processor compares the priorities of process P1 and P2.
If P1>P2 then the processor continue the process P1. Otherwise the processor switches to
process P2 and the process P1 is moved to ready state.
A process is put into the waiting state, if the process need an event to occur or an I/O
task is to be done. For example if a process in running state need an I/O device, then the
process is moved to blocked (or) waiting state.
A process in the blocked (waiting) state is moved to the ready state when the event for
which it has been waiting occurs.
The OS maintains a ready list and a blocked list to store references to processes not
running. The following figure shows the process state diagram
New
Ready
running
Terminated
waiting
11
The new and terminated states are worth a bit of more explanation. The former refer
to a process that has just been defined (e.g. because an user issued a command at a terminal),
and for which the OS has performed the necessary housekeeping chores. The latter refers to a
process whose task is not running anymore, but whose context is still being saved (e.g.
because an user may want to inspect it using a debugger program).
A simple way to implement this process handling model in a multiprogramming OS
would be to maintain a queue (i.e. a first-in-first-out linear data structure) of processes, put at
the end the queue the current process when it must be paused, and run the first process in the
queue.
However, it's easy to realize that this simple two-state model does not work. Given
that the number of processes that an OS can manage is limited by the available resources, and
that I/O events occur at much larger time scale that CPU events, it may well be the case that
the first process of the queue must still wait for an I/O event before being able to restart; even
worse, it may happen that most of the processes in the queue must wait for I/O. In this
condition the scheduler would just waste its time shifting the queue in search of a runnable
process.
A solution is to split the not-running process class according to two possible
conditions: processes blocked in the wait for an I/O event to occur, and processes in pause,
but nonetheless ready to run when given a chance. A process would then be put from running
into blocked state on account of an event wait transition, would go running to ready state due
to a timeout transition, and from blocked to ready due to event occurred transition.
This model would work fine if the OS had an very large amount of main memory
available and none of the processes hogged too much of it, since in this case there would
always be a fair number of ready processes. However, because the costs involved this
scenario is hardly possible, and again the likely result is a list of blocked processes all waiting
for I/O.
2.4 Process State Transition
The various process states, displayed in a state diagram, with arrows indicating
possible transitions between states. Processes go through various process states which
determine how the process is handled by the operating system kernel. The specific
implementations of these states vary in different operating systems, and the names of these
states are not standardized, but the general high-level functionality is the same.
When a process is created, it needs to wait for the process scheduler (of the operating
system) to set its status to "waiting" and load it into main memory from secondary storage
device (such as a hard disk or a CD-ROM). Once the process has been assigned to a
processor by a short-term scheduler, a context switch is performed (loading the process into
the processor) and the process state is set to "running" - where the processor executes its
instructions. If a process needs to wait for a resource (such as waiting for user input, or
waiting for a file to become available), it is moved into the "blocked" state until it no longer
needs to wait - then it is moved back into the "waiting" state. Once the process finishes
execution, or is terminated by the operating system, it is moved to the "terminated" state
where it waits to be removed from main memory. The act of assigning a processor to the first
12
process on the ready list is called dispatching. The OS may use an interval timer to allow a
process to run for a specific time interval or quantum.
2.5 Operations on Process
There are various operations that can be performed on a process and are listed below.
a) create
b) destroy
c) suspend
d) resume
e) change priority
f) block
g) wake up
h) dispatch
i) enable
2.5.1 Suspend and Resume
The OS could then perform a suspend transition on blocked processes, swapping them
on disk and marking their state as suspended (after all, if they must wait for I/O, they might
as well do it out of costly RAM), load into main memory a previously suspended process,
activating into ready state and go on. However, swapping is an I/O operation in itself, and so
at a first sight things might seem to get even worse doing this way. Again the solution is to
carefully reconsider the reasons why processes are blocked and swapped, and recognize that
if a process is blocked because it waits for I/O, and then suspended. The I/O event might
occur while it sits swapped on the disk.
13
Process state transitions with suspend and resume
We can thus classify suspended processes into two classes: ready-suspended for those
suspended process whose restarting condition has occurred, and blocked-suspended for those
who must still wait instead. This classification allows the OS to pick from the good pool of
ready-suspended processes, when it wants to revive the queue in main memory. Provisions
must be made for passing processes between the new states. This means allowing for new
transitions: activate and suspend between ready and ready-suspended, and between blockedsuspended and blocked as well, end event-occurred transitions from blocked to ready, and
from blocked-suspended to ready-suspended as well.
2.5.2 Suspending a process
•
•
•
•
Indefinitely removes it from contention for time on a processor without being
destroyed
Useful for detecting security threats and for software debugging purposes
A suspension may be initiated by the process being suspended or by another process
A suspended process must be resumed by another process
2.6 Let us Sum Up
In this lesson we have learnt about
a) the Process
b) the process states
c) the process control block
d) the process state transitions
14
2.7 Points for Discussion
a) Discuss about process control block
b) Discuss about the process transition diagram
2.8 Model answers to “Check your Progress”
A process is more than the program code, it includes the program counter, the process
stack, and the contents of process register etc. The purpose of process stack is to store
temporary data, such as subroutine parameters, return address and temporary variables. All
these information will be stored in Process Control Block (PCB). The Process control block
is a record containing many pieces of information associated with a process including process
state, program counter, cpu registers, memory management information, accounting
information, I/O status information, cpu scheduling information, memory limits, and list of
open files.
2.9 Lesson end Activities
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
a) Define Process
b) What are the various possible states of a process
2.10 References
e) Charles Crowley, Chapter 5, 8 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
f) H.M. Deitel, Chapter 3, 4 of “Operating Systems”, Second Edition, Pearson
Education, 2001
g) Andrew S. Tanenbaum, Chapter 2 of “Modern Operating Systems”, PHI, 1996
h) D.M. Dhamdhere, Chapter 10 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
15
LESSON – 3: INTERRUPT PROCESSING AND CONTEXT
SWITCHING
CONTENTS
3.1 Aims and Objectives
3.2 Interrupt Processing
3.2.1
Identifying the High Level Routine
3.2.2
Interrupt Dispatch Table
3.2.3
Rules for Interrupt Processing
3.2.4
Rescheduling while Processing an Interrupt
3.2.5
Interrupt Classes
3.3 Context Switching
3.3.1
Context Switches and Mode Switches
3.3.2
Cost of Context Switching
3.4 Let us Sum Up
3.5 Points for Discussion
3.6 Model answers to “Check your Progress”
3.7 Lesson end Activities
3.8 References
3.1 Aims and Objectives
This lesson focuses on the following concepts
a) Introduction to interrupt processing
b) Interrupt classes
c) Context switching
The main objective of this lesson is to make the student aware of the interrupt
processing, classes and context switching.
3.2 Introduction to Interrupt Processing
An interrupt is an event that alters the sequence in which a processor executes
instructions and it is generated by the hardware of the computer system.
Handling interrupts
• After receiving an interrupt, the processor completes execution of the current
instruction, then pauses the current process
• The processor will then execute one of the kernel’s interrupt-handling functions
• The interrupt handler determines how the system should respond
• Interrupt handlers are stored in an array of pointers called the interrupt vector
16
•
After the interrupt handler completes, the interrupted process is restored and executed
or the next process is executed
Interrupt handlers should be written in high-level languages so that they are easy to
understand and modify. They should be written in assembly language for efficiency reasons
and because they manipulate hardware registers and use special call/return sequences that
cannot be coded in high-level languages.
To satisfy both goals certain OS employs the following two-level strategy. Interrupts
branch to low-level interrupt dispatch routines that are written in assembly language. These
handle low-level tasks such as saving registers and returning from the interrupt when it has
been processed. However, they do little else, they call high-level interrupt routines to do the
bulk of interrupt processing, passing them enough information to identify the interrupting
device. The OS provides three interrupt dispatchers: one to handle input interrupts, one to
handle output interrupts, and one to handle clock interrupts. Input and output dispatchers are
separated for convenience, and a special clock dispatcher is provided for efficiency reasons.
NT provides an even more modular structure. A single routine, called the trap handler,
handles both traps (called exceptions by NT) and interrupts, saving and restoring registers,
which are common to both. If the asynchronous event was an interrupt, then it calls an
interrupt handler. The task of this routine is to raise the processor priority to that of the device
interrupting (so that a lower-level device cannot preempt), call either an internal kernel
routine or an external routine called an ISR, and then restore the processor priority. These
two routines roughly correspond to our high-level interrupt routine and the trap handler
corresponds to our low-level routine. Thus, NT trades off the efficiency of 2 levels for the
reusability of 3 levels.
The device table entry for an input or output interrupt handler points at the high-level
part of the interrupt handler, which is device-specific, and not the low-level part which is
shared by all devices (except the clock).
17
Handling interrupts
3.2.1 Identifying the High-Level Routine
If all input (output) interrupts branch to the same input (output) dispatch routine, how
does the dispatcher know which device-specific interrupt routine to call? The input (output)
dispatch routine needs some way to discover the device that interrupted it, so that it can use
this information to call the appropriate high-level routine. There are several ways to identify
an interrupting device. Here are two of them:
The dispatcher may use a special machine instruction to get either the device address
or the interrupt vector address of the device. Not all machines have such instructions. The
dispatcher may poll devices until it finds one with an interrupt pending.
The following `trick' is used, which is common to operating systems, to help identify
the interrupting device. The device descriptor (not the device address) is stored in the second
word of the interrupt vector. Recall that this word stores the value to be loaded into the PS
register when the interrupt routine is called. Certain OS uses the lower order 4 bits, which are
used for condition codes, to store the descriptor of the device. These four bits are then used to
identify the high-level routine.
18
3.2.2 Interrupt Dispatch Table
An interrupt dispatch table is used to relate device descriptors with (high-level)
interrupt routines. The table is indexed by a device descriptor and each entry contains the
following information:
•
•
•
•
The address of the input interrupts routine
An input code which is passed as an argument to the input interrupts routine
The address of the output interrupts routine
An output code which is passed as an argument to the above routine
The input (output) dispatch routine uses the device descriptor to access the appropriate
dispatch table entry and calls the input (output) interrupt routine with the input (output) code
as an argument code. The input and output codes can be anything the high-level routines
need. In certain OS, they are initially the minor number of the device. Thus only one interrupt
routine is needed for all devices of the same type. The minor number is used to distinguish
between these devices.
3.2.3 Rules for Interrupt Processing
There are several rules for interrupt processing: First, they should ensure that shared
data are not manipulated simultaneously by different processes. One way to do so is to make
interrupt routines uninterruptible. Thus the PS value stored in the interrupt vector of a device
disables interrupts. This value is loaded when the interrupt handler is invoked. As a result,
the interrupt routine is uninterruptible while the PS maintains this priority.
However, the PS may be changed while the interrupt routine is executing if it calls
resched, which may switch to a process that has interrupts enabled. Therefore, the interrupt
routine has to ensure that it completes changes to global data structures before it makes any
call that results in context switching. An interrupt routine should also make sure that it does
not keep interrupts disabled too long. For instance, if the processor does not accept a
character from an input device before another arrives, data will be lost.
Finally, interrupt routines should never call routines that could block the current
process (that is the process executing when the interrupt occurred) in some queue. Otherwise,
if the interrupt occurs while the null process is executing, the ready list will be empty.
However, resched assumes that there is always some process to execute! Some process
should always be runnable to that interrupts can be executed. Thus interrupt routines need to
call only those routines that leave the current process in the ready or current state, and may
not call routines such as wait.
3.2.4 Rescheduling while Processing an Interrupt
We assumed above that interrupt routines could call resched. We now answer the
following questions: First, is it useful to do so? Second, is it safe?
It is useful to call resched from an interrupt routine. An output routine after removing
a character from a buffer may signal a semaphore to allow another process to write data to the
buffer space that it makes available. Similarly, an input routine might send data it obtains
from the device to a process. In each case, the routine resched is called.
19
It is also safe to call resched. Intuitively it may not seem so, because switching to a
process that has interrupts enabled could lead to a sequence of interrupts piling up until the
stack overflowed. However, such a danger does not exist for the following reason: A process
that is executing an interrupt handler cannot be interrupted again. Some other process,
however, can be. Thus a process's stack will hold the PS and PC value for only one interrupt
and there will never be more interrupts pending than the number of processes in the system.
3.2.5 Interrupt Classes
SVC (supervisor call) interrupts: - They enable software to respond to signals from
hardware. These are initiated by a running process that executes the SVC instruction. An
SVC is a user generated request for a particular system service such ad performing
input/output, obtaining more storage, or communicating with the system operator. A user
must request a service through an SVC which helps the OS secure from the user.
I/O interrupts: - These are initiated by the input/output hardware. They signal to the CPU
that the status of a channel or device has changed. For example, I/O interrupts are caused
when an I/O operation completes, when an I/O error occurs, or when a device is made ready.
External interrupts: - These are caused by various events including the expiration of a
quantum on an interrupting clock, the pressing of the console’s interrupt key by the operator,
or the receipt of a signal from another processor on a multiprocessor system.
Restart interrupts: - These occur when the operator presses the console’s restart button, or
when a restart SIGP (signal processor) instruction arrives from another processor on a
multiprocessor system.
Program check interrupts: - These occur as a program’s machine language instructions are
executed. These problems include divide by zero, arithmetic overflow, data is in wrong
format, attempt to execute an invalid operation code, attempt to reference beyond the limits
of real memory, attempt by a user process to execute a privileged instruction and attempts to
reference a protected resources.
Machine check interrupts: - These are caused by malfunctioning hardware.
3.3 Context Switching
A context switch (also sometimes referred to as a process switch or a task switch) is
the switching of the CPU (central processing unit) from one process or thread to another. A
process (also sometimes referred to as a task) is an executing (i.e., running) instance of a
program. In Linux, threads are lightweight processes that can run in parallel and share an
address space (i.e., a range of memory locations) and other resources with their parent
processes (i.e., the processes that created them).
A context is the contents of a CPU's registers and program counter at any point in
time. A register is a small amount of very fast memory inside of a CPU (as opposed to the
slower RAM main memory outside of the CPU) that is used to speed the execution of
computer programs by providing quick access to commonly used values, generally those in
the midst of a calculation. A program counter is a specialized register that indicates the
position of the CPU in its instruction sequence and which holds either the address of the
20
instruction being executed or the address of the next instruction to be executed, depending on
the specific system.
Context switching can be described in slightly more detail as the kernel (i.e., the core
of the operating system) performing the following activities with regard to processes
(including threads) on the CPU: (1) suspending the progression of one process and storing the
CPU's state (i.e., the context) for that process somewhere in memory, (2) retrieving the
context of the next process from memory and restoring it in the CPU's registers and (3)
returning to the location indicated by the program counter (i.e., returning to the line of code at
which the process was interrupted) in order to resume the process.
A context switch is sometimes described as the kernel suspending execution of one
process on the CPU and resuming execution of some other process that had previously been
suspended. Although this wording can help clarify the concept, it can be confusing in itself
because a process is, by definition, an executing instance of a program. Thus the wording
suspending progression of a process might be preferable.
3.3.1 Context Switches and Mode Switches
Context switches can occur only in kernel mode. Kernel mode is a privileged mode of
the CPU in which only the kernel runs and which provides access to all memory locations
and all other system resources. Other programs, including applications, initially operate in
user mode, but they can run portions of the kernel code via system calls. A system call is a
request in a UNIX-like operating system by an active process (i.e., a process currently
progressing in the CPU) for a service performed by the kernel, such as input/output (I/O) or
process creation (i.e., creation of a new process). I/O can be defined as any movement of
information to or from the combination of the CPU and main memory (i.e. RAM), that is,
communication between this combination and the computer's users (e.g., via the keyboard or
mouse), its storage devices (e.g., disk or tape drives), or other computers.
The existence of these two modes in Unix-like operating systems means that a similar,
but simpler, operation is necessary when a system call causes the CPU to shift to kernel
mode. This is referred to as a mode switch rather than a context switch, because it does not
change the current process.
Context switching is an essential feature of multitasking operating systems. A
multitasking operating system is one in which multiple processes execute on a single CPU
seemingly simultaneously and without interfering with each other. This illusion of
concurrency is achieved by means of context switches that are occurring in rapid succession
(tens or hundreds of times per second). These context switches occur as a result of processes
voluntarily relinquishing their time in the CPU or as a result of the scheduler making the
switch when a process has used up its CPU time slice.
A context switch can also occur as a result of a hardware interrupt, which is a signal
from a hardware device (such as a keyboard, mouse, modem or system clock) to the kernel
that an event (e.g., a key press, mouse movement or arrival of data from a network
connection) has occurred.
Intel 80386 and higher CPUs contain hardware support for context switches.
However, most modern operating systems perform software context switching, which can be
21
used on any CPU, rather than hardware context switching in an attempt to obtain improved
performance. Software context switching was first implemented in Linux for Intel-compatible
processors with the 2.4 kernel.
One major advantage claimed for software context switching is that, whereas the
hardware mechanism saves almost all of the CPU state, software can be more selective and
save only that portion that actually needs to be saved and reloaded. However, there is some
question as to how important this really is in increasing the efficiency of context switching.
Its advocates also claim that software context switching allows for the possibility of
improving the switching code, thereby further enhancing efficiency, and that it permits better
control over the validity of the data that is being loaded.
3.3.2 The Cost of Context Switching
Context switching is generally computationally intensive. That is, it requires
considerable processor time, which can be on the order of nanoseconds for each of the tens or
hundreds of switches per second. Thus, context switching represents a substantial cost to the
system in terms of CPU time and can, in fact, be the most costly operation on an operating
system.
Consequently, a major focus in the design of operating systems has been to avoid
unnecessary context switching to the extent possible. However, this has not been easy to
accomplish in practice. In fact, although the cost of context switching has been declining
when measured in terms of the absolute amount of CPU time consumed, this appears to be
due mainly to increases in CPU clock speeds rather than to improvements in the efficiency of
context switching itself.
One of the many advantages claimed for Linux as compared with other operating
systems, including some other Unix-like systems, is its extremely low cost of context
switching and mode switching.
Context switches are
– Performed by the OS to stop executing a running process and begin executing
a previously ready process
– Save the execution context of the running process to its PCB
– Load the ready process’s execution context from its PCB
– Must be transparent to processes
– Require the processor to not perform any “useful” computation
• OS must therefore minimize context-switching time
– Performed in hardware by some architectures
3.4 Let us Sum Up
In this lesson we have learnt about
a) the Interrupt Processing
b) the interrupt classes
c) and the context switching
22
3.5 Points for Discussion
a) Discuss about context switching
b) Discuss about the interrupt classes
3.6 Model answers to “Check your Progress”
A context switch (also sometimes referred to as a process switch or a task switch) is
the switching of the CPU (central processing unit) from one process or thread to another. A
process (also sometimes referred to as a task) is an executing (i.e., running) instance of a
program. In Linux, threads are lightweight processes that can run in parallel and share an
address space (i.e., a range of memory locations) and other resources with their parent
processes (i.e., the processes that created them).
3.7 Lesson - end Activities
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
a) Discuss about the interrupt
b) Discuss about various types of interrupt processing
3.8 References
a) Charles Crowley, Chapter 5, 8 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
b) H.M. Deitel, Chapter 3, 4 of “Operating Systems”, Second Edition, Pearson
Education, 2001
c) Andrew S. Tanenbaum, Chapter 2 of “Modern Operating Systems”, PHI, 1996
d) D.M. Dhamdhere, Chapter 10 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
23
LESSON – 4: SEMAPHORES
CONTENTS
4.1 Aims and Objectives
4.2 Introduction to process synchronization
4.3 Critical section problem
4.4 Semaphores
4.5 Classical problems of synchronization
4.6 Let us Sum Up
4.7 Points for discussion
4.8 Model answers to Check your progress
4.9 Lesson end Activities
4.10
References
4.1 Aims and Objectives
The aim of this lesson is to learn the concept of process synchronization, critical
section problem, semaphores and classical problems of synchronization
The objectives of this lesson are to make the student aware of the following concepts
a) process synchronization
b) critical section problem
c) semaphores
d) and classical problems of synchronization
4.2 Introduction to process synchronization
Process synchronization will be clear with the following example. Consider the code for
consumer and producer as follows
Producer
while(1)
{
while(counter = = buffersize);
buffer[in]=nextproduced;
in = (in+1) % buffersize;
counter ++;
}
Consumer
while(1)
{
while(counter = = 0);
24
nextconsumed = buffer[out];
out = (out+1) % buffersize;
counter --;
}
Both the codes are correct separately but will not function correctly when executed
concurrently. This is because the counter++ may be executed in machine language as three
separate statements as
register1 = counter
register1 = register1 + 1
counter = register1
and the counter- - as
register2 = counter
register2 = register2 - 1
counter = register2
The execution of these statements for the two processes may lead to the following
condition for example.
a)
b)
c)
d)
e)
f)
producer execute register1 = counter (register1 =5)
producer execute register1 = register1 + 1 (register1 =6)
consumer execute register2= counter (register2 =5)
consumer execute register2 = register2 - 1 (register2 =4)
producer execute counter = register1 (counter = 6)
consumer execute counter = register2 (counter = 4)
You can see that the answer counter =4 is wrong as there are 5 full buffers.
A situation like this, where several processes access and manipulate the same data
concurrently and the outcome of the execution depends on the particular order in which the
access take place, is called a race condition. For this we have to make sure that only one
process at a time should manipulate the counter. Such situation occurs frequently in OS and
we require some form of synchronization of processes.
4.3 Critical section problem
Each process will be having a segment of code called a critical section, in which the
process may be changing a common variable, updating a table, writing a file, and so on.
do
{
entry section
critical section
exit section
25
reminder section
}while(1);
A solution should satisfy the following three requirements
a) Mutual exclusion: If a process is executing in its critical section, then no other
processes can be executing in their critical sections
b) Progress: If no process is executing in its critical section and some processes wish to
enter their critical sections, then only those processes that are not executing in their
remainder section can participate in the decision on which will enter its critical
section next, and this selection cannot be postponed indefinitely.
c) Bounded waiting: There exists a bound on the number of times that other processes
are allowed to enter their critical sections after a process has made a request to enter
its critical section and before that request is granted.
There are many solutions available and can be done using various implementation methods.
One of the multiprocess solutions for the critical section is given below
Data structure
boolean choosing[n];
int number[n];
do
{
chossing[i]=true;
number[i]=max(number[0], number[1], …, number[n-1])+1;
choosing[i] = false;
for(j=0;j<n;j++)
{
while(choosing[j]);
while((number[j]!=0)&&(number[j, j]<number[i, i]));
}
CRITICAL SECTION
number[i]=0;
REMINDER SECTION
}while(1);
4.4 Semaphores
The solution described in the above section cannot be generalized most of the times. To
over come this, we have a synchronization tool called a semaphore proposed by Dijkstra.
Semaphores are a pair composed of an integer variable that apart from initialization is
accessed only through two standard atomic operations: wait and signal.
26


wait: decrease the counter by one; if it gets negative, block the process and enter its id
in the queue.
signal: increase the semaphore by one; if it's still negative, unblock the first process of
the queue, removing its id from the queue itself.
wait(S)
{
while(S<=0);
S - -;
}
signal(S)
{
S++;
}
The atomicity of the above operations is an essential requirement: mutual exclusion while
accessing a semaphore must be strictly enforced by the operating system. This is often done
by implementing the operations themselves as uninterruptible system calls.
It's easy to see how a semaphore can be used to enforce mutual exclusion on a shared
resource: a semaphore is assigned to the resource, it's shared among all processes that need to
access the resource, and its counter is initialized to 1. A process then waits on the semaphore
upon entering the critical section for the resource, and signals on leaving it. The first process
will get access. If another process arrives while the former is still in the critical section, it'll
be blocked, and so will further processes. In this situation the absolute value of the counter is
equal to the number of processes waiting to enter the critical section. Every process leaving
the critical section will let a waiting process use the resource by signaling the semaphore.
In order to fully understand semaphores, we'll discuss them briefly before engaging any
system calls and operational theory.
The name semaphore is actually an old railroad term, referring to the crossroad ``arms''
that prevent cars from crossing the tracks at intersections. The same can be said about a
simple semaphore set. If the semaphore is on (the arms are up), then a resource is available
(cars may cross the tracks). However, if the semaphore is off (the arms are down), then
resources are not available (the cars must wait).
While this simple example may stand to introduce the concept, it is important to realize
that semaphores are actually implemented as sets, rather than as single entities. Of course, a
given semaphore set might only have one semaphore, as in our railroad example.
Perhaps another approach to the concept of semaphores is to think of them as resource
counters. Let's apply this concept to another real world scenario. Consider a print spooler,
capable of handling multiple printers, with each printer handling multiple print requests. A
hypothetical print spool manager will utilize semaphore sets to monitor access to each printer.
Assume that in our corporate print room, we have 5 printers online. Our print spool manager
allocates a semaphore set with 5 semaphores in it, one for each printer on the system. Since
each printer is only physically capable of printing one job at a time, each of our five
27
semaphores in our set will be initialized to a value of 1 (one), meaning that they are all
online, and accepting requests.
John sends a print request to the spooler. The print manager looks at the semaphore set,
and finds the first semaphore which has a value of one. Before sending John's request to the
physical device, the print manager decrements the semaphore for the corresponding printer
by a value of negative one (-1). Now, that semaphore's value is zero. In the world of System
V semaphores, a value of zero represents 100% resource utilization on that semaphore. In our
example, no other request can be sent to that printer until it is no longer equal to zero.
When John's print job has completed, the print manager increments the value of the
semaphore which corresponds to the printer. Its value is now back up to one (1), which means
it is available again. Naturally, if all 5 semaphores had a value of zero, that would indicate
that they are all busy printing requests, and that no printers are available.
Although this was a simple example, please do not be confused by the initial value of one
(1) which was assigned to each semaphore in the set. Semaphores, when thought of as
resource counters, may be initialized to any positive integer value, and are not limited to
either being zero or one. If it were possible for each of our five printers to handle 10 print
jobs at a time, we could initialize each of our semaphores to 10, decrementing by one for
every new job, and incrementing by one whenever a print job was finished. Semaphores have
a close working relationship with shared memory segments, acting as a watchdog to prevent
multiple writes to the same memory segment.
4.5 Classic problems of synchronization
There are several problems of synchronization. Some of them are bounded buffer
problem, reader writers problem, and dining philosophers problem. In this section we explain
only the solution of bounded-buffer problem, namely the producer consumer problem.
The solution for producer-consumer problem can be achieved using semaphore as shown
in the following code, where mutex, empty and full are semaphores initialized to 1, n 0
respectivily.
CODE FOR PRODUCER
do
{
…
produce an item in nextp
…
wait(empty);
wait(mutex);
…
add nextp to buffer
…
signal(mutex);
signal(full);
}while(1);
CODE FOR CONSUMER
28
do
{
wait(full);
wait(mutex);
…
remove an item from buffer to nextc
…
signal(mutex);
signal(empty);
…
consume the item in nextc
…
}while(1);
4.6 Let us Sum Up
In this lesson we have learnt about
a)
b)
c)
d)
the process synchronization
the critical section problem
the Semaphores
and the classical problems of synchronization
4.7 Points for Discussion
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
a)
b)
c)
d)
e)
What is synchronization
Define critical section problem
Define Semaphore
Discuss about the need of Semaphores
Discuss about synchronization based on an example
4.8 Model answers to “Check your Progress”
A semaphore, in computer science, is a protected variable (or abstract data type)
which constitutes the classic method for restricting access to shared resources, such as shared
memory, in a multiprogramming environment. A semaphore is a counter for a set of available
resources, rather than a locked/unlocked flag of a single resource. It was invented by Edsger
Dijkstra and first used in the THE operating system. The value of the semaphore is
initialized to the number of equivalent shared resources being controlled. In the special case
where there is a single equivalent shared resource, the semaphore is called a binary
semaphore. The general-case semaphore is often called a counting semaphore.
Semaphores are the classic solution to the dining philosophers problem, although they do not
prevent all resource deadlocks.
4.9 Lesson end Activities
29
Try to implement semaphore in C under Unix
4.10
References
a) Charles Crowley, Chapter 8 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
b) H.M. Deitel, Chapter 4, 5 of “Operating Systems”, Second Edition, Pearson
Education, 2001
c) Andrew S. Tanenbaum, Chapter 11 of “Modern Operating Systems”, PHI, 1996
d) D.M. Dhamdhere, Chapter 13 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
30
LESSON – 5: DEADLOCK AND INDEFINITE POSTPONEMENT
CONTENTS
5.1 Aims and Objectives
5.2 Introduction
5.3 Characteristics to Deadlock
5.4 Deadlock prevention and avoidance
5.5 Deadlock detection and recovery
5.6 Let us Sum Up
5.7 Points for discussion
5.8 Model answers to Check your Progress
5.9 Lesson - end Activities
5.10
References
5.1 Aims and Objectives
The aim of this lesson is to learn the concept of Deadlock and indefinite postponement.
The objectives of this lesson are to make the student aware of the following concepts
a) Deadlock prevention
b) Deadlock Avoidance
c) Deadlock detection
d) Deadlock recovery
5.2 Introduction
One problem that arises in multiprogrammed systems is deadlock. A process or thread
is in a state of deadlock (or is deadlocked) if the process or thread is waiting for a particular
event that will not occur. In a system deadlock, one or more processes are deadlocked. Most
deadlocks develop because of the normal contention for dedicated resources (i.e., resources
that may be used by only one user at a time). Circular wait is characteristic of deadlocked
systems.
One example of a system that is prone to deadlock is a spooling system. A common
solution is to restrain the input spoolers so that, when the spooling files begin to reach some
saturation threshold, they do not read in more print jobs. Today's systems allow printing to
begin before the job is completed so that a full, or nearly full, spooling file can be emptied or
partially cleared even while a job is still executing. This concept has been applied to
streaming audio and video clips, where the audio and video begin to play before the clips are
fully downloaded.
In any system that keeps processes waiting while it makes resource-allocation and
process scheduling decisions, it is possible to delay indefinitely the scheduling of a process
while other processes receive the system's attention. This situation, variously called indefinite
31
postponement, indefinite blocking, or starvation, can be as devastating as deadlock. Indefinite
postponement may occur because of biases in a system's resource scheduling policies. Some
systems prevent indefinite postponement by increasing a process's priority as it waits for a
resource—this technique is called aging.
Resources can be preemptable (e.g., processors and main memory), meaning that they
can be removed from a process without loss of work, or nonpreemptible meaning that they
(e.g., tape drives and optical scanners), cannot be removed from the processes to which they
are assigned. Data and programs certainly are resources that the operating system must
control and allocate. Code that cannot be changed while in use is said to be reentrant. Code
that may be changed but is reinitialized each time it is used is said to be serially reusable.
Reentrant code may be shared by several processes simultaneously, whereas serially reusable
code may be used by only one process at a time. When we call particular resources shared,
we must be careful to state whether they may be used by several processes simultaneously or
by only one of several processes at a time. The latter kind—serially reusable resources—are
the ones that tend to become involved in deadlocks.
5.3 Characteristics of Deadlock
The four necessary conditions for deadlock are:
a) A resource may be acquired exclusively by only one process at a time (mutual
exclusion condition);
b) A process that has acquired an exclusive resource may hold it while waiting to
obtain other resources (wait-for condition, also called the hold-and-wait condition);
c) Once a process has obtained a resource, the system cannot remove the resource
from the process's control until the process has finished using the resource (nopreemption condition);
d) And two or more processes are locked in a "circular chain" in which each process
in the chain is waiting for one or more resources that the next process in the chain is
holding (circular-wait condition).
Because these are necessary conditions for a deadlock to exist, the existence of a deadlock
implies that each of them must be in effect. Taken together, all four conditions are necessary
and sufficient for deadlock to exist (i.e., if all these conditions are in place, the system is
deadlocked).
The four major areas of interest in deadlock research are deadlock prevention, deadlock
avoidance, deadlock detection, and deadlock recovery.
32
5.4 Deadlock prevention and avoidance
5.4.1 Deadlock prevention
In deadlock prevention our concern is to condition a system to remove any possibility
of deadlocks occurring. Havender observed that a deadlock cannot occur if a system denies
any of the four necessary conditions.
The first necessary condition, namely that processes claim exclusive use of the
resources they require, is not one that we want to break, because we specifically want to
allow dedicated (i.e., serially reusable) resources.
Denying the "wait-for" condition requires that all of the resources a process needs to
complete its task be requested at once, which can result in substantial resource
underutilization and raises concerns over how to charge for resources.
Denying the "no-preemption" condition can be costly, because processes lose work
when their resources are preempted.
Denying the "circular-wait" condition uses a linear ordering of resources to prevent
deadlock. This strategy can increase efficiency over the other strategies, but not without
difficulties.
5.4.2 Deadlock avoidance
In deadlock avoidance the goal is to impose less stringent conditions than in deadlock
prevention in an attempt to get better resource utilization. Avoidance methods allow the
possibility of deadlock to loom, but whenever a deadlock is approached, it is carefully
sidestepped. Dijkstra's Banker's Algorithm is an example of a deadlock avoidance algorithm.
In the Banker's Algorithm, the system ensures that a process's maximum resource need does
not exceed the number of available resources. The system is said to be in a safe state if the
operating system can guarantee that all current processes can complete their work within a
finite time. If not, then the system is said to be in an unsafe state. Dijkstra's Banker's
Algorithm requires that resources be allocated to processes only when the allocations result in
safe states. It has a number of weaknesses (such as requiring a fixed number of processes and
resources) that prevent it from being implemented in real systems.
A deadlock avoidance algorithm dynamically examines the resource allocation state
to ensure that there can never be a circular wait condition. The resource allocation state is
defined by the number of available and allocated resources, and the maximum demands of
the processes. A state is safe if the system can allocate resources to each process (up to its
maximum) in some order and still avoid a deadlock.
33
5.4.3 Banker’s algorithm
Let ‘Available’ be a vector of length m indicating the number of available resources
of each type, ‘Max’ be an ‘n x n’ matrix defining the maximum demand of each process,
‘Allocation’ be an ‘n x m’ matrix defining the number of resources of each type currently
allocated to each process, and let ‘need’ be an ‘n x m’ matrix indicating the remaining
resource need of each process.
Let Requesti be the request vector for process pi. If requesti[j]=k, then process pi
wants k instances of resource type rj. When a request for resources is made by process p i, the
following actions are taken:
a)
b)
c)
If Requesti< = Needi then proceed to step b. Else the process has exceeded its
maximum claim
If Requesti< = Available the proceed to step c. Else the resources are not
available and pi must wait
The system pretends to have allocated the requested resources to process p i by
modifying the state as follows.
Available = Available - Requesti
Allocationi = Allocationi + Requesti
Need i = Need i - Requesti
If the resulting resource allocation state is safe, the transaction is completed and process p i is
allocated its resources. If the new state is unsafe, the pi must wait for Requesti and the old
resource allocation state is restored.
5.4.4 Safety Algorithm
The algorithm for finding out whether a system is in a safe state or not can be
described as follows.
a)
b)
c)
c)
Let Work and Finish be vectors of length m and n. Initialize Work =
Available and Finish[i] = false
Find an i such that
a. Finish[i] = false and
b. Needi < = Work
If no such i exists, go to step d
Work = Work + Allocationi
Finish[i] = true
Go to step b
If Finish[i] = true for all i, then the system is in a safe state.
34
5.5 Deadlock detection and recovery
Deadlock detection methods are used in systems in which deadlocks can occur. The
goal is to determine if a deadlock has occurred, and to identify those processes and resources
involved in the deadlock. Deadlock detection algorithms can incur significant runtime
overhead. To facilitate the detection of deadlocks, a directed graph indicates resource
allocations and requests. Deadlock can be detected using graph reductions. If a process's
resource requests may be granted, then we say that a graph may be reduced by that process. If
a graph can be reduced by all its processes, then there is no deadlock. If a graph cannot be
reduced by all its processes, then the irreducible processes constitute the set of deadlocked
processes in the graph.
a)
b)
c)
c)
Let Work and Finish be vectors of length m and n. Initialize Work =
Available. For i = 1, …, n, if Allocationi  0 then Finish[i] = false, else
Finish[i] = True
Find an i such that
a. Finish[i] = false and
b. Requesti < = Work
If no such i exists, go to step d
Work = Work + Allocationi
Finish[i] = true
Go to step b
If Finish[i] = false for some i, then the system is in deadlock state.
Deadlock recovery methods are used to clear deadlocks from a system so that it may
operate free of the deadlocks, and so that the deadlocked processes may complete their
execution and free their resources. Recovery typically requires that one or more of the
deadlocked processes be flushed from the system. The suspend/resume mechanism allows the
system to put a temporary hold on a process (temporarily preempting its resources), and,
when it is safe to do so, resume the held process without loss of work. Checkpoint/rollback
facilitates suspend/resume capabilities by limiting the loss of work to the time at which the
last checkpoint (i.e., saved state of the system) was taken. When a process in a system
terminates (by accident or intentionally as the result of a deadlock recovery algorithm), the
system performs a rollback by undoing every operation related to the terminated process that
occurred since the last checkpoint. To ensure that data in the database remains in a consistent
state when deadlocked processes are terminated, database systems typically perform resource
allocations using transactions.
In personal computer systems and workstations, deadlock has generally been viewed
as a limited annoyance. Some systems implement the basic deadlock prevention methods
suggested by Havened, while others ignore the problem—these methods seem to be
satisfactory. While ignoring deadlocks may seem dangerous, this approach can actually be
rather efficient. If deadlock is rare, then the processor time devoted to checking for deadlocks
significantly reduces system performance. However, given current trends, deadlock will
continue to be an important area of research as the number of concurrent operations and
number of resources becomes large, increasing the likelihood of deadlock in multiprocessor
and distributed systems. Also, many real-time systems, which are becoming increasingly
prevalent, require deadlock-free resource allocation.
35
5.6 Let us Sum Up
In this lesson we have learned about the characteristics of deadlock, deadlock
prevention mechanism, deadlock avoidance using bankers and safety algorithms, deadlock
detection and recovery.
5.7 Points for discussion
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
a)
b)
c)
d)
e)
What are the characteristics of deadlock?
Discuss about the deadlock prevention mechanism
Discuss about bankers and safety algorithm
How deadlock can be detected.
What are the steps needed for deadlock recovery
5.8 Model answers to Check your Progress
The characteristic of deadlock can be explained by means of the four necessary
conditions for deadlock namely, a) A resource may be acquired exclusively by only one
process at a time (mutual exclusion condition); b) A process that has acquired an exclusive
resource may hold it while waiting to obtain other resources (wait-for condition, also called
the hold-and-wait condition); c) Once a process has obtained a resource, the system cannot
remove the resource from the process's control until the process has finished using the
resource (no-preemption condition); d) And two or more processes are locked in a "circular
chain" in which each process in the chain is waiting for one or more resources that the next
process in the chain is holding (circular-wait condition).
Because these are necessary conditions for a deadlock to exist, the existence of a
deadlock implies that each of them must be in effect. Taken together, all four conditions are
necessary and sufficient for deadlock to exist (i.e., if all these conditions are in place, the
system is deadlocked).
5.9 Lesson - end Activities
Try to write a program in C/C++ to implement bankers and safety algorithms
5.10 References
i) Charles Crowley, Chapter 8 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
j) H.M. Deitel, Chapter 6 of “Operating Systems”, Second Edition, Pearson
Education, 2001
k) Andrew S. Tanenbaum, Chapter 6 of “Modern Operating Systems”, PHI, 1996
l) D.M. Dhamdhere, Chapter 12 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997.
36
UNIT – II
LESSON – 6: STORAGE MANAGEMENT
CONTENTS
6.1 Aims and Objectives
6.2 Introduction
6.3 Contiguous storage allocation
6.4 Non-Contiguous Storage Allocation
6.5 Fixed Partitions Multiprogramming
6.6 Variable Partitions Multiprogramming
6.7 Multiprogramming with Storage Swapping
6.8 Let us Sum Up
6.9 Points for discussion
6.10
Model answer to Check your Progress
6.11
Lesson - end Activities
6.12
References
6.1 Aims and Objectives
The aim of this lesson is to learn the concept of Real storage management strategies.
The objectives of this lesson are to make the student aware of the following concepts
a) Contiguous storage allocation
b) Non Contiguous storage allocation
c) Fixed partition multiprogramming
d) Variable partition multiprogramming
e) and multiprogramming with storage swapping
6.2 Introduction
The organization and management of the main memory or primary memory or real
memory of a computer system has been one of the most important factors influencing
operating systems design. Regardless of what storage organization scheme we adopt for a
particular system, we must decide what strategies to use to obtain optimal performance.
Storage Management Strategies are of four types as described below
a)
Fetch strategies – concerned with when to obtain the next piece of program or
data for transfer to main storage from secondary storage
a. Demand fetch – in which the next piece of program or data is brought into
the main storage when it is referenced by a running program
37
b)
c)
b. Anticipatory fetch strategies – where we make guesses about the future
program control which will yield improved system performance
Placement strategies – concerned with determining where in main storage to
place and incoming program. Examples are first fit, best fit and worst fit
Replacement strategies – concerned with determining which piece of program
or data to replace to make place for incoming programs
6.3 Contiguous Storage Allocation
In contiguous storage allocation each program has to occupy a single contiguous
block of storage locations. The simplest memory management scheme is the bare machine
concept, where the user is provided with the complete control over the entire memory space.
The next simplest scheme is to divide memory into two sections, one for the user and
one for the resident monitor of the operating system. A protection hardware can be provided
in terms of fence register to protect the monitor code and data from changes by the user
program.
The resident monitor memory management scheme may seem of little use since it
appears to be inherently single user. When they switched to the next user, the current
contents of user memory were written out to a backing storage and the memory of the next
user is read in called as swapping
6.4 Non-Contiguous Storage Allocation
Memory is divided into a number of regions or partitions. Each region may have
one program to be executed. Thus the degree of multiprogramming is bounded by the
number of regions. When a region is free, a program is selected from the job queue and
loaded into the free regions. Two major schemes are multiple contiguous fixed partition
allocation and multiple contiguous variable partition allocation.
6.5 Fixed Partitions Multiprogramming
Fixed partitions multiprogramming also called as multiprogramming with fixed
number of task (MFT) or multiple contiguous fixed partition allocation. MFT has the
following properties.





Several users simultaneously compete for system resources
switch between I/O jobs and calculation jobs for instance
Allowing Relocation and Transfers between partitions
Protection implemented by the use of several boundary registers : low and high
boundary registers, or base register with length
Fragmentation occurs if user programs cannot completely fill a partition - wasteful.
All the jobs that enters the system are put into queues. Each partition has its own job
queue as shown in the following figure. The memory requirements of each job and the
available regions in determining which jobs are allocated memory are taken care by the job
scheduler. When a job is allocated space, it is loaded into a region and then compete for the
CPU. When a job terminates, it releases its memory region, which the job scheduler may
then fill with another job from the job queue. Another way is to allow a single unified queue
38
and the decisions of choosing a job reflect the choice between a best-fit-only or a bestavailable-fit job memory allocation policy.
Figure :Multiprogramming - fixed partitions
6.6 Variable Partitions Multiprogramming
Variable Partitions Multiprogramming also called as multiprogramming with variable
number of task (MVT) or multiple contiguous variable partition allocation. In this scheme,
there are no fixed partitions. The memory is divided into regions and allocated to programs
as and when it is required.
Figure: Multiprogramming - variable partitions
MVT has the following properties


Variable partitions - allowing jobs to use as much space as they needed (limit being
the complete memory space)
No need to divide jobs into types - reduce waste if jobs could not fill partition
However, complete wastage is still not reduced. The OS keeps a table indicating
which parts of memory are available and which are occupied. Initially all memory is
available for user programs, and is considered as one large block of available memory, a hole.
When a job arrives and needs memory, we search for a hole large enough for this job. If we
find one, we allocate only as much as is needed, keeping the rest available to satisfy future
requests. The most common algorithms for allocating memory are first-fit and best-fit.
39
Figure: Allocating Memory
Once a block of memory has been allocated to a job, its program can be loaded into
that space and executed. The minimal hardware support needed is the same as with MFT:
two registers containing the upper and lower bounds of the region of memory allocated to this
job. When the CPU scheduler selects this process, the dispatcher loads these bounds registers
with the correct values. Since every address generated by the cpu is checked against these
registers, we can protect other users programs and data from being modified by this running
process.
6.7 Multiprogramming with Storage Swapping
Multiprogramming with storage swapping has the following features. It is different from the
previous schemes where user programs remain in memory until completion.







swap job out of main storage (to secondary storage) if it requires service from some
external routine (so another process can use the CPU)
A job may typically be need to swapping in and out many time before completion
Now, main storage is large enough to have many user programs active at the same
time (swapping in and out of main memory)
Allocate a bit more than necessary for process to grow during execution
Area on disk where this happens : swap space (usually the /tmp area in unix)
Sometimes, swap space is automatically allocated on process creation (hence, a fixed
swap space per process)
An average process in the middle of memory (after system has reached equilibrium)
will encounter half allocations and half deallocations (above and below)

The Fifty percent rule = if mean number of processes in memory is n, the mean
number of holes is n/2

Thats because adjacent holes are merged, adjacent processes are not (hence a process /
hole asymmetry) - just a heuristic
6.8 Let us sum up
40
In this lesson we have learnt about the concept of Real storage management strategies
like a) Contiguous storage allocation, b) Non Contiguous storage allocation, c) Fixed
partition multiprogramming, d) Variable partition multiprogramming, e) and
multiprogramming with storage swapping
6.9 Points for discussion
a) What happens if no job in queue can meet the size of the slot left by the departing job?
b) How do we keep track of memory in implementation
6.10 Model answers to Check your Progress
The answers for the questions given in 6.8 are
a) This leads to the creation of holes in main storage that must be filled.
b) We can keep track of memory in implementation by use of
i) linked lists,
ii) buddy system (dividing memory according to a power of 2) - leads to big
wastes (checkerboading or external fragmentation)
iii) or bitmaps (use a structure, where a 0 indicates if block is free, and 1
otherwise)
6.11 Lesson - end activities
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
o
o
Differentiate between MFT and MVT
Advantages of swapping
41
6.12 References
m) Charles Crowley, Chapter 10, 11, 12 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
n) H.M. Deitel, Chapter 7, 8, 9 of “Operating Systems”, Second Edition, Pearson
Education, 2001
o) Andrew S. Tanenbaum, Chapter 3 of “Modern Operating Systems”, PHI, 1996
p) D.M. Dhamdhere, Chapter 15 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
42
LESSON 7 – VIRTUAL STORAGE
CONTENTS
7.1 Aims and Objectives
7.2 Introduction
7.2.1
7.2.2
Overlays
Dynamic Loading
7.3
Contiguous storage allocation
7.4
Steps in handling page fault
7.5
Page replacement algorithms
7.5.1
FIFO
7.5.2
Optimal replacement
7.5.3
Least recently used
7.6
Working sets
7.7
Demand paging
7.8
Page size
7.9
Let us Sum Up
7.10
Points for discussion
7.11
Model answers to check your progress
7.12
Lesson end Activities
7.13
References
7.1 Aims and Objectives
The aim of this lesson is to learn the concept of virtual storage management strategies.
The objectives of this lesson are to make the student aware of the following concepts
f) virtual storage management strategies
g) page replacement strategies
h) working sets
i) demand paging
j) and page size
7.2 Introduction
Virtual Memory is technique which allows the execution of processes that may not be
completely in memory. The main advantage of this scheme is that user programs can be larger than
physical memory. The ability to execute a program which is only partially in memory would have
many benefits which includes (a) users can write programs for a very large virtual address space, (b)
more users can be run at the same time, with a corresponding increase in cpu utilization and
throughput, but no increase in response time or turnaround time, (c) less I/O would be needed to load
or swap each user into memory, so each user would run faster.
43
7.2.1 Overlays
Overlay is a technique which keeps in memory only those instructions and data that are
currently needed and when other instructions are needed they are loaded into space that was
previously occupied by instructions that are no longer needed.
7.2.2
Dynamic Loading
Here a routine is not called until it is called. The advantage of dynamic loading is that an
unused routine is never loaded. This scheme is particularly useful when large amounts of code are
needed to handle infrequently.
7.3 Virtual storage management strategies
There are three main strategies namely
Fetch strategies – concerned with when a page or segment should be brought from secondary
to primary storage
Placement strategies – concerned with where in primary storage to place an incoming page or
segment
Replacement strategies – concerned with deciding which page or segment to displace to make
room for an incoming page or segment when primary storage is already fully committed
7.4 Steps in handling page fault
When a page is not available in the main memory a page fault occurs. When a page fault
occurs, the OS has to do some necessary steps to bring the required page from secondary storage
device to main memory. The steps in handling page fault are as follows
a) First check whether the reference is valid or not from the internal table of process control
block (PCB)
b) Bring the page if it is not already loaded and the reference is valid
c) Find a free frame
d) Read the desired page into the newly allocated frame
e) Then modify the internal table in the PCB to indicate that the page is now available
f) Restart the instruction that was interrupted.
7.5 Page replacement algorithms
There are many page replacement algorithms and the most important three are FIFO, optimal
replacement and least recently used. This subsection explains the above three algorithms.
44
7.5.1
FIFO
The simplest page replacement algorithm is first in first out. In this scheme, when a page
must be replaced, the oldest page is chosen. For example consider the page reference string
1, 5, 6, 1, 7, 1, 5, 7, 6, 1, 5, 1, 7
For a three frame case, the FIFO will work as follows. Let all our 3 frames are initially empty.
1
1
5
1
5
6
7
5
6
7
1
6
7
1
5
6
1
5
6
7
5
You can see, FIFO creates eight page faults.
7.5.2 Optimal replacement
In optimal page replacement algorithm, we replace that page which will not be used for the
longest period of time. For example for the reference string
1, 5, 6, 1, 7, 1, 5, 7, 6, 1, 5, 1, 7
with 3 frames, the page faults will be as follows
1
1
5
1
5
6
1
5
7
1
5
6
1
5
7
You can see that Optimal replacement, creates six page faults
7.5.3 Least recently used
Most of the case, predicting the future page references is difficult and hence implementing
optimal replacement is difficult. Hence there is a need of other scheme which approximates the
optimal replacement. Least recently used (LRU) schemes approximate the future uses by the past
used pages. In LRU scheme, we replace those pages which have not been used for the longest period
of time.
For example for the reference string
1, 5, 6, 1, 7, 1, 5, 7, 6, 1, 5, 1, 7
with 3 frames, the page faults will be as follows
1
1
5
1
5
6
1
7
6
1
7
5
6
7
5
6
7
1
6
5
1
7
5
1
You can see that LRU creates nine page faults
7.6 Working sets
If the number of frames allocated to a low-priority process falls below the minimum number
required, we must suspend its execution. We should then page out it remaining pages, freeing all of
its allocated frames. A process is thrashing if it is spending more time paging than executing.
45
Thrashing can cause severe performance problems. To prevent thrashing, we must provide a process
with as many frames as it needs. There are several techniques available to know how many frame a
process needs. Working sets is a strategy which starts by looking at what a program is actually using.
The set of most recent page references is the working set denoted by . The accuracy of the
working set depends upon the selection of . If it is too small, page fault will increase an if it is too
large, then it is very difficult to allocate the required frames.
For example,
12 3141525253 62131 46145
 = 11
ws={2,3,1,4,5}
11312113 121 3456 66 3
 = 11
ws={1,3,2}
You can see that the working set (ws) at two different times for the window size .=11. [The working
set refers to the pages the process has used during that time for the window size ]. So at the
maximum the above given example needed atleast 5 frames, otherwise page fault will occur. In most
of the cases we will allocate the number of frames to a process depending on the average working set
size.
Let Si be the average working set size for each process in the system. Then
D   Si
is the total demand for frames. Thus process i needs Si frames. If the total demand is greater than the
total number of available frames, thrashing will occur.
7.7 Demand paging
Demand paging is the most common virtual memory system. Demand paging is similar to a
paging system with swapping. When we need a program, it is swapped from the backing storage.
There are also lazy swappers, which never swaps a page into memory unless it is needed. The lazy
swapper decreases the swap time and the amount of physical memory needed, allowing an increased
degree of multiprogramming.
7.8 Page size
There is no single best page size. The designers of the Operating system will decide the page
size for an existing machine. Page sizes are usually be in powers of two, ranging from 28 to 212 bytes
or words. The size of the pages will affect in the following way.
a)
b)
c)
d)
Decreasing the page size increases the number of pages and hence the size of the page
table.
Memory is utilized better with smaller pages.
For reducing the I/O time we need to have smaller page size.
To minimize the number of page faults, we need to have a large page size.
7.9 Let us Sum Up
In this lesson we have learnt about the concept of virtual storage management
strategies like page replacement strategies, working sets, demand paging, and page size
46
7.10
Points for discussion
After learning this lesson, try to discuss among your friends and answer these
questions to check your progress.
a) What is the use of demand paging
b) Differentiate between optimal replacement and LRU
7.11
Model answers to check your progress
Demand paging is the most common virtual memory system. Demand paging is similar to a
paging system with swapping. When we need a program, it is swapped from the backing storage.
There are also lazy swappers, which never swaps a page into memory unless it is needed. The lazy
swapper decreases the swap time and the amount of physical memory needed, allowing an increased
degree of multiprogramming.
7.12
Lesson end Activities
For the reference string
1, 5, 6, 1, 7, 1, 5, 7, 6, 1, 5, 1, 7
find the page fault rate for various page replacement algorithms with number of frame as 5.
7.13
References
 Charles Crowley, Chapter 11, 12 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
 H.M. Deitel, Chapter 7, 8, 9 of “Operating Systems”, Second Edition, Pearson
Education, 2001
 Andrew S. Tanenbaum, Chapter 3 of “Modern Operating Systems”, PHI, 1996
 D.M. Dhamdhere, Chapter 15 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
47
UNIT – III
LESSON – 8: PROCESSOR MANAGEMENT
CONTENTS
8.1 Aims and Objectives
8.2 Introduction
8.3 Preemptive Vs Non-Preemptive scheduling
8.4 Priorities
8.5 Deadline scheduling
8.6 Let us Sum Up
8.7 Points for discussion
8.8 Model answer to Check your Progress
8.9 Lesson end Activities
8.10
References
8.1 Aims and Objectives
A multiprogramming operating system allows more than one process to be loaded into
the executabel memory at a time and for the loaded process to share the CPU using timemultiplexing.Part of the reason for using multiprogramming is that the operating system itself
is implemented as one or more processes, so there must be a way for the operating system
and application processes to share the CPU. Another main reason is the need for processes to
perf I/O operations in the normal course of computation. Since I/O operations ordinarily
require orders of magnitude more time to complete than do CPU instructions,
multiprograming systems allocate the CPU to another process whenever a process invokes an
I/O operation
Make sure your scheduling strategy is good enough with the following criteria:






Utilization/Efficiency: keep the CPU busy 100% of the time with useful work
Throughput: maximize the number of jobs processed per hour.
Turnaround time: from the time of submission to the time of completion, minimize
the time batch users must wait for output
Waiting time: Sum of times spent in ready queue - Minimize this
Response Time: time from submission till the first response is produced, minimize
response time for interactive users
Fairness: make sure each process gets a fair share of the CPU
The aim of this lesson is to learn the concept of processor management and related issues.
The objectives of this lesson are to make the student aware of the following concepts
k) preemptive scheduling
48
l) Non preemptive scheduling
m) Priorities
n) and deadline scheduling
8.2 Introduction
When one or more process is runnable, the operating system must decide which one
to run first. The part of the operating system that makes decision is called the Scheduler; the
algorithm it uses is called the Scheduling Algorithm.
An operating system has three main CPU schedulers namely the long term scheduler,
short term scheduler and medium term schedulers. The long term scheduler determines
which jobs are admitted to the system for processing. It selects jobs from the job pool and
loads them into memory for execution. The short term scheduler selects from among the jobs
in memory which are ready to execute and allocated the cpu to one of them. The medium
term scheduler helps to remove processes from main memory and from the active contention
for the cpu and thus reduce the degree of multiprogramming.
The cpu scheduler has another component called as dispatcher. It is the module that
actually gives control of the cpu to the process selected by the short term scheduler which
involves loading of registers of the process, switching to user mode and jumping to the proper
location.
Before looking at specific scheduling algorithms, we should think about what the
scheduler is trying to achieve. After all the scheduler is concerned with deciding on policy,
not providing a mechanism. Various criteria come to mind as to what constitutes a good
scheduling algorithm. Some of the possibilities include:
1. Fairness – make sure each process gets its fair share of the CPU.
2. Efficiency (CPU utilization) – keep the CPU busy 100 percent of the time.
3. Response Time [Time from the submission of a request until the first response is
produced] – minimize response time for interactive users.
4. Turnaround time [The interval from the time of submission to the time of completion]
– minimize the time batch users must wait for output.
5. Throughput [Number of jobs that are completed per unit time] – maximize the
number of jobs processed per hour.
6. Waiting time – minimize the waiting time of jobs
8.3 Preemptive Vs Non-Preemptive
The Strategy of allowing processes that are logically runnable to be temporarily
suspended is called Preemptive Scheduling. ie., a scheduling discipline is preemptive if the
CPU can be taken away. Preemptive algorithms are driven by the notion of prioritized
computation. The process with the highest priority should always be the one currently using
49
the processor. If a process is currently using the processor and a new process with a higher
priority enters, the ready list, the process on the processor should be removed and returned to
the ready list until it is once again the highest-priority process in the system.
Run to completion is also called Nonpreemptive Scheduling. ie., a scheduling
discipline is nonpreemptive if, once a process has been given the CPU, the CPU cannot be
taken away from that process. In short, Non-preemptive algorithms are designed so that once
a process enters the running state(is allowed a process), it is not removed from the processor
until it has completed its service time ( or it explicitly yields the processor). This leads to race
condition and necessitates of semaphores, monitors, messages or some other sophisticated
method for preventing them. On the other hand, a policy of letting a process run as long as it
is wanted would mean that some process computing π to a billion places could deny service
to all other processes indefinitely.
8.4 Priorities
A priority is associated with each job, and the cpu is allocated to the job with the
highest priority. Priorities are generally some fixed numbers such as 0 to 7 or 0 to 4095.
However there is no general agreement on whether 0 is the highest or lowest priority.
Priority can be defined either internally or externally. Examples of internal priorities
are time limits, memory requirements, number of open files, average I/O burst time, CPU
burst time, etc. External priorities are given by the user.
A major problem with priority scheduling algorithms is indefinite blocking or
starvation. A solution to this problem is aging. Aging is a technique of gradually increasing
the priority of jobs that wait in the system for a long time.
8.5 Deadline scheduling
Certain jobs have to be completed in specified time and hence to be scheduled based
on deadline. If delivered in time, the jobs will be having high value and otherwise the jobs
will be having nil value. The deadline scheduling is complex for the following reasons
a) Giving resource requirements of the job in advance is difficult
b) A deadline job should be run without degrading other deadline jobs
c) In the event of arriving new jobs, it is very difficult to carefully plan resource
requirements
d) Resource management for deadline scheduling is really an overhead
50
8.6 Let us Sum Up
In this lesson we have learnt about
a) preemptive scheduling
b) Nonpreemptive scheduling
c) Priorities
d) and deadline scheduling
8.7 Points for discussion
a) Why CPU scheduling is important?
b) How to evaluate scheduling algorithm?
8.8 Model answer to Check your Progress
The answers for the question in 8.8 are discussed below.
a) Because it can can have a big effect on resource utilization and the overall
performance of the system.
b) There are many possible criteria:
a. CPU Utilization: Keep CPU utilization as high as possible. (What is
utilization, by the way?).
b. Throughput: number of processes completed per unit time.
c. Turnaround Time: mean time from submission to completion of process.
d. Waiting Time: Amount of time spent ready to run but not running.
e. Response Time: Time between submission of requests and first response to
the request.
f. Scheduler Efficiency: The scheduler doesn't perform any useful work, so
any time it takes is pure overhead. So, need to make the scheduler very
efficient.
8.9 Lesson end Activities
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
a) What is CPU scheduling?
b) Discuss about deadline scheduling. How to evaluate scheduling algorithm?
51
8.10
References
 Charles Crowley, Chapter 8 of “Operating Systems – A Design-Oriented Approach”,
Tata McGraw-Hill, 2001
 H.M. Deitel, Chapter 10 of “Operating Systems”, Second Edition, Pearson Education,
2001
 Andrew S. Tanenbaum, Chapter 11 of “Modern Operating Systems”, PHI, 1996
 D.M. Dhamdhere, Chapter 9 of “Systems Programming and Operating Systems”, Tata
McGraw-Hill, 1997
52
LESSON – 9: PROCESSOR SCHEDULING
CONTENTS
9.1 Aims and Objectives
9.2 Introduction
9.3 First In First out (FIFO)
9.4 Round Robin Scheduling
9.5 Quantum size
9.6 Shortest Job First (SJF)
9.7 Shortest remaining time first (SRF)
9.8 Highest response ratio next (HRN)
9.9 Let us Sum Up
9.10
Points for discussion
9.11
Model answers to Check your Progress
9.12
Lesson - end Activities
9.13
References
9.1 Aims and Objectives
The aim of this lesson is to learn the concept of processor scheduling and scheduling
algorithms.
The objectives of this lesson are to make the student aware of the following concepts
o) FIFO
p) Round Robin
q) Shortest Job first
r) Shortest remaining time first
s) Highest response ratio next (HRN)
9.2 Introduction
Different algorithms have different properties and may favor one class of processes
over another. In choosing best algorithm, the characteristic explained in the previous lesson
must be considered which includes CPU utilization, throughput, turnaround time, waiting
time, and response time.
The five most important algorithms used in the CPU scheduling are FIFO, Round
Robin, Shortest Job first, Shortest remaining time first and Highest response ratio next
(HRN). The following sections describe each one of it.
9.3 First In First out (FIFO)
53
A low overhead paging algorithm is the FIFO (First in, First Out) algorithm. To
illustrate how this works, consider a supermarket that has enough shelves to display exactly k
different products. One day, some company introduces a new convenience food-instant,
freeze-dried, organic yogurt that can be reconstituted in a microwave oven. It is an immediate
success, so our finite supermarket has to get rid of one old product in order to stock it.
One possibility is to find the product that the supermarket has been stocking the
longest and get rid of it on the grounds that no one is interested anymore. In effect, the
supermarket maintains the linked list of all the products it currently sells in the order they
were introduced. The new one goes on the back of the list; the one at the front of the list is
dropped.
As a page replacement algorithm, the same idea is applicable. The operating system
maintains a list of all pages currently in memory, with the page at the head of the list the
oldest one and the page at the tail the most recent arrival. On a page fault, the page at the
head is removed and the new page added to the tail of the list. When applied to stores, FIFO
might remove mustache wax, but it might also remove flour, salt or butter. When applied to
computers the same problems arise. For this reason, FIFO in its pure form is rarely used.
Consider for example, the following scenario of four jobs and the corresponding CPU
burst time arrived in the order of job number.
Job
Burst time
1
20
2
10
3
5
4
15
FCFS algorithm allocates the job to the cpu in the order of there arrival and the
following Gantt chart shows the result of execution.
Job 1
Job 2
0
20
30
Job 3
35
Job4
50
The waiting times of jobs are
Job 1
0
Job 2
20
Job 3
30
Job 4
35
-------------------------------------------Total waiting time = 85
Hence the average waiting time is 21.25.
The turnaround times of jobs are
Job 1
Job 2
20
30
54
Job 3
35
Job 4
50
-------------------------------------------Total turnaround time = 135
Hence the average turnaround time is 33.25.
9.4 Round Robin Scheduling
One of the oldest, simplest, fairest and most widely used algorithms is Round Robin.
Each process is assigned a time interval, called the Quantum, which it is allowed to run. If the
process is still running at the end of the quantum, the CPU is preempted and given to another
process. If the process has blocked or finished before the quantum has elapsed, the CPU
switching is done. Round robin is easy to implement. All scheduler has to maintain a list of
run able processes.
B
F
D
G
A
F
D
G
A
B
The only interesting issue with the round robin is the length of the quantum.
Switching from one process to another requires a certain amount of time for doing the
administration – saving and loading registers and memory maps, updating various tables and
lists, etc. suppose that this process switch or context switch takes 5 msecs. Also suppose the
quantum is set say 20 msecs. With these parameters, after doing 20 msecs of useful work, the
CPU will have to spend 5 msecs on process switching. Twenty percent of the CPU time will
be wasted on administrative overhead.
Consider for example, the following scenario of four jobs and the corresponding CPU burst
time arrived in the order of job number.
Job
Burst time
1
20
2
10
3
5
4
15
RR algorithm allocates a quantum of time to each job in a rotation and the following
Gantt chart shows the result of execution. Let the time quantum be 5
Job 1
0
50
Job 2
5
10
Job 3
Job 4
15
20
Job 1
25
Job 2
30
Job 4
35
40
Job 1
45
Job 4
Job 1
50
The waiting times of jobs are
Job 1
0 + 15 + 10 + 5
= 30
Job 2
5 + 15
= 20
Job 3
10
= 10
Job 4
15 + 10 + 5
= 30
--------------------------------------------------------------
55
Total waiting time
= 90
Hence the average waiting time is 22.5
The turnaround times of jobs are
Job 1
50
Job 2
30
Job 3
15
Job 4
45
-------------------------------------------Total turnaround time = 140
Hence the average turnaround time is 35
9.5 Quantum size
In round robin scheduling algorithm, no process is allocated the cpu for more than one
time quantum in a row. If its cpu burst exceeds a time quantum, it is preempted and put back
in the ready queue.
If the time quantum is too large, round robin becomes equivalent to FCFS. If the time
quantum is too small in terms of microseconds, round robin is called as processor sharing and
appears as if each of n processes has its own processors running at 1/n the speed of the real
processor.
The time quantum size must be large with respect to the context switch time. If the
context switch time is approximately 5 percent of the time quantum, then cpu will spent 5
percent of the time for context switching.
9.6 Shortest Job First (SJF)
Most of the above algorithms were designed for interactive systems. Now let us look
at one that is especially appropriate for batch jobs where the run times are known in advance.
In an insurance company, for example, people can predict quite accurately how long it will
take to run a batch of 1000 claims, since similar work is done every day. When several
equally important jobs are sitting in the input queue waiting to be started, the scheduler
should use shortest job first.
8
A
4
B
6
C
5
D
Here we find four jobs A, B, C, and D with run times of 8, 4, 6 and 3 minutes
respectively. By running them in that order, the turn around time for A is 8 minutes, for B is
12 Minutes, for C is 18 Minutes and for D is 23 minutes for an average of 15.25 minutes.
Now if we do the SJF first as follows
56
B
D
C
A
The turnaround times are now 4, 9, 15 and 23 minutes for an average of 12.75
minutes.
Shortest job first is provably optimal. Consider the case of four jobs, with runtimes of a, b, c
and d respectively. The first job finishes at time a, the second finishes at time a+b and so on.
The mean turnaround time is (4a+3b+2c+d)/4. It is clear that ‘a’ contributes more to the
average than the other, so it should be the shortest job, with b next, then c and finally d as the
longest, as it affects only its own turnaround time. The same argument applies equally well to
any number of jobs.
Consider for example, the following scenario of four jobs and the corresponding CPU burst
time arrived in the order of job number.
Job
1
2
3
4
Burst time
20
10
5
15
The algorithm allocates the shortest job first.
Job 3
0
50
Job 2
5
Job4
15
Job 1
30
50
The waiting times of jobs are
Job 1
30
Job 2
5
Job 3
0
Job 4
15
-------------------------------------Total waiting time = 50
Hence the average waiting time is 12.5
The turnaround times of jobs are
Job 1
50
Job 2
15
Job 3
5
Job 4
30
-------------------------------------------Total turnaround time = 100
Hence the average turnaround time is 25
57
9.7 Shortest remaining time first (SRF)
Shortest job first may be either preemptive or non-preemptive. When a new job
arrives, at the ready queue with a shortest cpu burst time, while a previous job is executing,
then a preemptive shortest job first algorithm will preempt the currently executing job, while
a non-preemptive shortest-job-first algorithm will allow the currently running job to finish its
cpu burst. Preemptive-shortest-job-first algorithm is also called as shortest remaining time
first.
Consider for example, the following scenario of four jobs and the corresponding CPU burst
time and arrival time.
Job
1
2
3
4
Burst time
20
10
5
15
Arrival time
0
2
4
19
The algorithm allocates the jobs as shown in Gantt chart
Job 1
0
Job 2
2
4
Job 3
9
Job 2
17
Job 1
19
Job 4
34
Job 1
50
The waiting times of jobs are
Job 1
0 + 15 + 15
Job 2
2+5
Job 3
4
Job 4
19
-------------------------------------Total waiting time
= 30
=7
=4
= 19
= 60
Hence the average waiting time is 15
The turnaround times of jobs are
Job 1
50
Job 2
17
Job 3
9
Job 4
34
-------------------------------------------Total turnaround time = 110
Hence the average turnaround time is 27.5
9.8 Highest response ratio next (HRN)
58
HRN is a nonpreemptive scheduling algorithm which considers both the CPU burst
time and waiting time. The priority of the job in HRN can be calculated as
priority = time waiting + service time
service time
where service time is the next cpu burst time. Here shortest jobs will get highest priority
since it appears in the denominator. Since waiting time appears in the numerator, longer
waiting jobs will also get priority.
9.9
Let us Sum Up
In this chapter we have learned about various scheduling algorithms like, FIFO, RR,
SJF, SRT and HRN.
9.10 Points for discussion
a) Consider performance of FCFS algorithm for three compute-bound processes. What if
have 4 processes P1 (takes 24 seconds), P2 (takes 3 seconds) and P3 (takes 3 seconds). If
arrive in order P1, P2, P3, what is Waiting Time?, Turnaround Time? and Throughput?
b)
c)
d)
e)
What about if processes come in order P2, P3, P1?
What happens with really a really small quantum?
What about having a really small quantum supported in hardware?
What about a really big quantum?
9.11 Model answers to Check your progress
The answers for the question given in 9.11 are
a)
b)
c)
d)
e)
Waiting Time? (24 + 27) / 3 = 17, Turnaround Time? (24 + 27 + 30) = 27,
Throughput? 30 / 3 = 10.
Waiting Time? (3 + 3) / 2 = 6, Turnaround Time? (3 + 6 + 30) = 13,
Throughput? 30 / 3 = 10.
It looks like you've got a CPU that is 1/n as powerful as the real CPU, where n is
the number of processes. Problem with a small quantum - context switch overhead.
Then, you have something called multithreading. Give the CPU a bunch of
registers and heavily pipeline the execution. Feed the processes into the pipe one
by one. Treat memory access like IO - suspend the thread until the data comes
back from the memory. In the meantime, execute other threads. Use computation
to hide the latency of accessing memory.
It turns into FCFS. Rule of thumb - want 80 percent of CPU bursts to be shorter
than time quantum
9.12 Lesson end Activities
Try to write C/C++ programs to implement FIFO, RR, SJF, SRT and HRN
59
9.13 References
 Charles Crowley, Chapter 8 of “Operating Systems – A Design-Oriented Approach”,
Tata McGraw-Hill, 2001
 H.M. Deitel, Chapter 10 of “Operating Systems”, Second Edition, Pearson Education,
2001
 Andrew S. Tanenbaum, Chapter 11 of “Modern Operating Systems”, PHI, 1996
 D.M. Dhamdhere, Chapter 9 of “Systems Programming and Operating Systems”, Tata
McGraw-Hill, 1997
 Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
60
LESSON – 10: DISTRIBUTED COMPUTING
CONTENTS
10.1
Aims and Objectives
10.2
Introduction
10.3
Classification of sequential and parallel processing
10.4
Array Processors
10.4.1 History
10.5
Dataflow computer
10.6
Let us Sum Up
10.7
Points for discussion
10.8 Model Answer to Check your Progress
10.9
Lesson - end Activities
10.10 References
10.1 Aims and Objectives
The aim of this lesson is to learn the concept of distribute computing and parallel computing.
The objectives of this lesson are to make the student aware of the following concepts
t) sequential processing
u) parallel processing
v) Array Processors
w) and Dataflow computer
10.2 Introduction
Parallel computing is the simultaneous execution of some combination of multiple
instances of programmed instructions and data on multiple processors in order to obtain
results faster. The idea is based on the fact that the process of solving a problem usually can
be divided into smaller tasks, which may be carried out simultaneously with some
coordination. The technique was first put to practical use by ILLIAC IV in 1976, fully a
decade after it was conceived.
A parallel computing system is a computer with more than one processor for parallel
processing. In the past, each processor of a multiprocessing system always came in its own
processor packaging, but recently-introduced multicore processors contain multiple logical
processors in a single package. There are many different kinds of parallel computers. They
are distinguished by the kind of interconnection between processors (known as "processing
elements" or PEs) and memory.
Parallel computers can be modelled as Parallel Random Access Machines (PRAMs).
The PRAM model ignores the cost of interconnection between the constituent computing
61
units, but is nevertheless very useful in providing upper bounds on the parallel solvability of
many problems. In reality the interconnection plays a significant role. The processors may
communicate and cooperate in solving a problem or they may run independently, often under
the control of another processor which distributes work to and collects results from them (a
"processor farm").
Processors in a parallel computer may communicate with each other in a number of
ways, including shared (either multiported or multiplexed, parallel supercomputers, NUMA
vs. SMP vs. massively parallel computer systems, distributed computing (esp. computer
clusters and grid computing). According to Amdahl's law, parallel processing is less efficient
than one x-times-faster processor from a computational perspective. However, since power
consumption is a super-linear function of the clock frequency on modern processors, we are
reaching the point where from an energy cost perspective it can be cheaper to run many low
speed processors in parallel than a single highly clocked processor.
10.3 Classification of sequential and parallel processing
Flynn's taxonomy, one of the most accepted taxonomies of parallel architectures,
classifies parallel (and serial) computers according to: whether all processors execute the
same instructions at the same time (single instruction/multiple data -- SIMD) or whether each
processor executes different instructions (multiple instruction/multiple data -- MIMD).
SISD (single instruction stream, single data stream) machines are uni-processor
computers that process one instruction at a time and are the most common architecture
available today. Array processors essentially performs operations simultaneously on every
element of an array namely SIMD (Single instruction stream, multiple data stream). Another
category is multiple instruction stream and single data stream (MISD) which is not in use.
Parallel processors or multiprocessors can handle multiple instruction stream and multiple
data streams.
Another major way to classify parallel computers is based on their memory
architectures. Shared memory parallel computers have multiple processors accessing all
available memory as global address space. They can be further divided into two main classes
based on memory access times: Uniform Memory Access (UMA), in which access times to
all parts of memory are equal, or Non-Uniform Memory Access (NUMA), in which they are
not. Distributed memory parallel computers also have multiple processors, but each of the
processors can only access its own local memory; no global memory address space exists
across them. Parallel computing systems can also be categorized by the numbers of
processors in them. Systems with thousands of such processors are known as massively
parallel. Subsequently there are what are referred to as "large scale" vs. "small scale" parallel
processors. This depends on the size of the processor, eg. a PC based parallel system would
generally be considered a small scale system. Parallel processor machines are also divided
into symmetric and asymmetric multiprocessors, depending on whether all the processors are
the same or not (for instance if only one is capable of running the operating system code and
others are less privileged).
A variety of architectures have been developed for parallel processing. For example a
Ring architecture has processors linked by a ring structure. Other architectures include
Hypercubes, Fat trees, systolic arrays, and so on.
62
10.4 Array Processors
A vector processor, or array processor, is a CPU design that is able to run
mathematical operations on multiple data elements simultaneously. This is in contrast to a
scalar processor which handles one element at a time. The vast majority of CPUs are scalar
(or close to it). Vector processors were common in the scientific computing area, where they
formed the basis of most supercomputers through the 1980s and into the 1990s, but general
increases in performance and processor design saw the near disappearance of the vector
processor as a general-purpose CPU.
Today most commodity CPU designs include some vector processing instructions,
typically known as SIMD (Single Instruction, Multiple Data), common examples include
SSE and AltiVec. Modern video game consoles and consumer computer-graphics hardware
rely heavily on vector processing in their architecture. In 2000, IBM, Toshiba and Sony
collaborated to create a Cell processor, consisting of one scalar processor and eight vector
processors, for the Sony PlayStation 3.
10.4.1 HISTORY
Vector processing was first worked on in the early 1960s at Westinghouse in their
Solomon project. Solomon's goal was to dramatically increase math performance by using a
large number of simple math co-processors (or ALUs) under the control of a single master
CPU. The CPU fed a single common instruction to all of the ALUs, one per "cycle", but with
a different data point for each one to work on. This allowed the Solomon machine to apply a
single algorithm to a large data set, fed in the form of an array. In 1962 Westinghouse
cancelled the project, but the effort was re-started at the University of Illinois as the ILLIAC
IV. Their version of the design originally called for a 1 GFLOPS machine with 256 ALUs,
but when it was finally delivered in 1972 it had only 64 ALUs and could reach only 100 to
150 MFLOPS. Nevertheless it showed that the basic concept was sound, and when used on
data-intensive applications, such as computational fluid dynamics, the "failed" ILLIAC was
the fastest machine in the world. It should be noted that the ILLIAC approach of using
separate ALUs for each data element is not common to later designs, and is often referred to
under a separate category, massively parallel computing.
The first successful implementation of vector processing appears to be the CDC
STAR-100 and the Texas Instruments Advanced Scientific Computer (ASC). The basic ASC
(i.e., "one pipe") ALU used a pipeline architecture which supported both scalar and vector
computations, with peak performance reaching approximately 20 MFLOPS, readily achieved
when processing long vectors. Expanded ALU configurations supported "two pipes" or "four
pipes" with a corresponding 2X or 4X performance gain. Memory bandwidth was sufficient
to support these expanded modes. The STAR was otherwise slower than CDC's own
supercomputers like the CDC 7600, but at data related tasks they could keep up while being
much smaller and less expensive. However the machine also took considerable time decoding
the vector instructions and getting ready to run the process, so it required very specific data
sets to work on before it actually sped anything up.
The vector technique was first fully exploited in the famous Cray-1. Instead of
leaving the data in memory like the STAR and ASC, the Cray design had eight "vector
registers" which held sixty-four 64-bit words each. The vector instructions were applied
63
between registers, which is much faster than talking to main memory. In addition the design
had completely separate pipelines for different instructions, for example, addition/subtraction
was implemented in different hardware than multiplication. This allowed a batch of vector
instructions themselves to be pipelined, a technique they called vector chaining. The Cray-1
normally had a performance of about 80 MFLOPS, but with up to three chains running it
could peak at 240 MFLOPS – a respectable number even today.
Other examples followed. CDC tried to re-enter the high-end market again with its
ETA-10 machine, but it sold poorly and they took that as an opportunity to leave the
supercomputing field entirely. Various Japanese companies (Fujitsu, Hitachi and NEC)
introduced register-based vector machines similar to the Cray-1, typically being slightly
faster and much smaller. Oregon-based Floating Point Systems (FPS) built add-on array
processors for minicomputers, later building their own minisupercomputers. However Cray
continued to be the performance leader, continually beating the competition with a series of
machines that led to the Cray-2, Cray X-MP and Cray Y-MP. Since then the supercomputer
market has focused much more on massively parallel processing rather than better
implementations of vector processors. However, recognizing the benefits of vector processing
IBM developed Virtual Vector Architecture for use in supercomputers coupling several scalar
processors to act as a vector processor.
Today the average computer at home crunches as much data watching a short
QuickTime video as did all of the supercomputers in the 1970s. Vector processor elements
have since been added to almost all modern CPU designs, although they are typically referred
to as SIMD. In these implementations the vector processor runs beside the main scalar CPU,
and is fed data from programs that know it is there.
10.5 Dataflow computer
Dataflow computing is a subject of considerable interest, since this class of computer
architectures exposes high level of parallelism.
A dataflow computer system VSPD-1 presented here is a research prototype of a
static dataflow architecture according to the Veen’s classification1. Static partitioning,
linking, and distribution of actors among processing modules must be done at the stage of
development of a dataflow application for VSPD-1.
VSPD-1 is designed to scale up to 16 processing nodes. The research prototype of
VSPD-1 implemented at the Computer Engineering Department, LETI (Leningrad Institute
of Electrical Engineering), consists of five processing modules (PM). Each VSPD-1
processing module consists of
- a microcomputer MS11200.5 with a 1MIPS microprocessor and 32-KB RAM;
- PROM for a loader and a kernel of a dataflow run-time system (called here
monitor);
- communication Qbus-Qbus adapters (CA);
- a I/O processor (IOP), that supports up to eight DMA channels for internode
communication
- an auxiliary control unit that implements most of operations on dataflow control2.
64
Restrictions of physical realization motivated the following configuration of the 5modules system: four processing modules are configured in a ring, and the fifth module is
connected with all other PMs. The fifth PM uses a communication adapter that connects the
processing module to a host computer DVK-4. The structure of the VSPD processing module
is presented in Fig.1.
Fig.1.
Dataflow computation is realized at the level of labeled code fragments called actors.
An actor is ready and can be performed, when it receives all data (operands) required for its
execution. An application for VSPD-1 must be written in a specific dataflow style where
actors are programmed in Pascal with macros in Macro-11 assembler. Each labeled code
segment that specifies an actor ends with an assembler macro which passes a pointer to a
destination list to the kernel and invokes it. The kernel sends data produced by the actor to
destination actors specified in the list.
A label of the actor serves as its starting address (activation point). Addresses of
destination actors that will receive the result tokens are stored in a destination table that
actually holds a dataflow graph of the application. A dataflow program includes also a table
of starting addresses of actors, and a table of ready bit vectors. A ready-bit vector indicates
for each of actor inputs whether a token has arrived at that input. Before computation starts,
the ready bit table contains initial mapping of tokens to actor inputs. A special mask stored
together with a ready-bit vector indicates the inputs which are always ready (constant or not
in use).
The system supports explicit and implicit transferring data from source-actors to
destination actors. With explicit transfer, data that are sent to a destination actor, are followed
by a result token, whose role is to indicate that data has been sent to the destination. A token
is 16-bits wide and consists of the following fields: the data type tag (T) that indicates
whether the data item sent is a vector or a scalar; the number of the destination processor
module (P); the number of the destination actor (A); the number of the actor input to which
the token is directed (I). In this way, parameters for explicit data transfer consist of two
words: a 16-bits token and a pointer to a data structure to be sent to the destination. The
format of a destination list entry is shows in Fig.2.
65
Implicit data transfer is used for local data flow between actors within a processing
module, and it is implemented through shared variables. Tokens are used for synchronization
of parallel processes in processing modules. Tokens that follow data, form a control flow
since each token indicates which destination actor can be performed after a source actor. The
actor is ready when it has collected tokens on all its inputs. The amount of token traffic can
be less than that of data flow. For example, when a source actor sends more than one data
item to a destination actor, it is enough to send only one token after all data are sent in the
vector form. If data is directed to remote actors located on the same remote node, data and all
tokens are sent in one message.
Token flow and data flow are controlled by a distributed software run-time system
together with the auxiliary hardware Control Unit (CU) in each processing node. The
auxiliary Control Unit consists of a register file (RF), a table memory (TM), a operational
unit (OU), a ready-actor queue (RAQ), microprogrammed controller (MPC), address selector
(AS) and bus adapters (BA). The table memory is split into three regions that are used to
store three tables: the table of starting addresses; the table of ready-bit vectors, and the table
of destination addresses. A structure of CU is shown in Fig.3.
66
Fig.3.
The auxiliary Control Unit receives a token and sets an appropriate ready-bit in the
read-bit table to indicate that the token has arrived. If the destination actor is ready, its
number is inserted to the ready actor queue. On a request from the software kernel, CU
fetches a ready actor (if any) from the head of the queue, and looks up the starting address
table. A starting address of the ready actor is returned to the kernel that invokes the actor.
After the actor completes, the kernel resets actor’s ready-bit vector masked by a always-ready
mask, and sends results and tokens to destination actors according to a destination list stored
in the destination address table of CU. Sending of data and tokens to remote processing
modules is performed by a I/O co-processor1 and communication adapters. A message that
includes data (vector or scalar), is passed to the destination processing unit via DMA. When
data transmission completes, a corresponding token is passed to the auxiliary Control Unit of
the destination processing module. A DMA engine is controlled by the IO co-processor. The
VSPD-1 system can be used, in particular, as a functional accelerator for a minicomputer
with a PDP-11 architecture for applications that require high performance computing, such as
simulation of complex dynamic objects in real-time.
10.6
Let us Sum Up
In this lesson we have learnt about
a) Sequential and Parallel processing
b) Array processors
c) and data flow computers
10.7
Points for discussion
Discuss about
a) array or vector proceesor
b) Discuss the working of a data flow computers
10.8
Model Answer to check your progress
The answer for the question a in section 10.8 can be written as
A vector processor, or array processor, is a CPU design that is able to run
mathematical operations on multiple data elements simultaneously. This is in contrast to a
scalar processor which handles one element at a time. The vast majority of CPUs are scalar
(or close to it). Vector processors were common in the scientific computing area, where they
formed the basis of most supercomputers through the 1980s and into the 1990s, but general
increases in performance and processor design saw the near disappearance of the vector
processor as a general-purpose CPU. Today most commodity CPU designs include some
vector processing instructions, typically known as SIMD (Single Instruction, Multiple Data),
common examples include SSE and AltiVec. Modern video game consoles and consumer
computer-graphics hardware rely heavily on vector processing in their architecture. In 2000,
IBM, Toshiba and Sony collaborated to create a Cell processor, consisting of one scalar
processor and eight vector processors, for the Sony PlayStation 3.
10.9 Lesson - end Activities
67
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
b) Discuss about sequential processing
c) Discuss about Parallel processing
68
10.10 References
 Charles Crowley, Chapter 6 of “Operating Systems – A Design-Oriented Approach”,
Tata McGraw-Hill, 2001
 H.M. Deitel, Chapter 11 of “Operating Systems”, Second Edition, Pearson Education,
2001
 Andrew S. Tanenbaum, Chapter 9, 10, 11, 12, 13 of “Modern Operating Systems”,
PHI, 1996
 D.M. Dhamdhere, Chapter 19 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
69
LESSON – 11: MULTIPROCESSING AND FAULT TOLERANCE
CONTENTS
11.1 Aims and Objectives
11.2 Introduction to process
11.3 Multiprocessing techniques
11.3.1 Instruction and data streams
11.3.2 Processor coupling
11.3.4 SISD multiprocessing
11.3.5 SIMD multiprocessing
11.3.6 MISD multiprocessing
11.3.7 MIMD multiprocessing
11.4 Fault-tolerant
11.4.1 Fault Tolerance Requirements
11.4.2 Fault-tolerance by replication, Redundancy and Diversity
11.5 Let us sum up
11.6
Points for discussion
11.7 Model answers to Check your Progress
11.8
Lesson - end activities
11.9 References
11.1 Aims and Objectives
In this lesson we will learn about the introduction to multiprocessing and fault
tolerance.
The objectives of this lesson is make the students aware of the following:
a)
b)
c)
d)
e)
f)
Multiprocessing techniques
Instruction and data streams
Processor coupling
SISD, SIMD, MISD, MIMD
Fault-tolerant, its requirements and
Fault-tolerance by replication, Redundancy and Diversity
70
11.2 Introduction
Multiprocessing is the use of two or more central processing units (CPUs) within a
single computer system. The term also refers to the ability of a system to support more than
one processor and/or the ability to allocate tasks between them. There are many variations on
this basic theme, and the definition of multiprocessing can vary with context, mostly as a
function of how CPUs are defined (multiple cores on one die, multiple chips in one package,
multiple packages in one system unit, etc.).
Multiprocessing sometimes refers to the execution of multiple concurrent software
processes in a system as opposed to a single process at any one instant. However, the term
multiprogramming is more appropriate to describe this concept, which is implemented mostly
in software, whereas multiprocessing is more appropriate to describe the use of multiple
hardware CPUs. A system can be both multiprocessing and multiprogramming, only one of
the two, or neither of the two.
In a multiprocessing system, all CPUs may be equal, or some may be reserved for
special purposes. A combination of hardware and operating-system software design
considerations determine the symmetry (or lack thereof) in a given system. For example,
hardware or software considerations may require that only one CPU respond to all hardware
interrupts, whereas all other work in the system may be distributed equally among CPUs; or
execution of kernel-mode code may be restricted to only one processor (either a specific
processor, or only one processor at a time), whereas user-mode code may be executed in any
combination of processors. Multiprocessing systems are often easier to design if such
restrictions are imposed, but they tend to be less efficient than systems in which all CPUs are
utilized equally.
11.3 Multiprocessing techniques
Systems that treat all CPUs equally are called symmetric multiprocessing (SMP)
systems. In systems where all CPUs are not equal, system resources may be divided in a
number of ways, including asymmetric multiprocessing (ASMP), non-uniform memory
access (NUMA) multiprocessing, and clustered multiprocessing.
11.3.1 Instruction and data streams
In multiprocessing, the processors can be used to execute a single sequence of
instructions in multiple contexts (single-instruction, multiple-data or SIMD, often used in
vector processing), multiple sequences of instructions in a single context (multipleinstruction, single-data or MISD, used for redundancy in fail-safe systems and sometimes
applied to describe pipelined processors or hyperthreading), or multiple sequences of
instructions in multiple contexts (multiple-instruction, multiple-data or MIMD).
71
11.3.2 Processor coupling
Tightly-coupled multiprocessor systems contain multiple CPUs that are connected at
the bus level. These CPUs may have access to a central shared memory (SMP or UMA), or
may participate in a memory hierarchy with both local and shared memory (NUMA). The
IBM p690 Regatta is an example of a high end SMP system. Intel Xeon processors
dominated the multiprocessor market for business PCs and were the only x86 option till the
release of AMD's Opteron range of processors in 2004. Both ranges of processors had their
own onboard cache but provided access to shared memory; the Xeon processors via a
common pipe and the Opteron processors via independent pathways to the system RAM.
Chip multiprocessors, also known as multi-core computing, involves more than one
processor placed on a single chip and can be thought of the most extreme form of tightlycoupled multiprocessing. Mainframe systems with multiple processors are often tightlycoupled.
Loosely-coupled multiprocessor systems (often referred to as clusters) are based on
multiple standalone single or dual processor commodity-computers interconnected via a high
speed communication system (Gigabit Ethernet is common). A Linux Beowulf cluster is an
example of a loosely-coupled system.
Tightly-coupled systems perform better and are physically smaller than looselycoupled systems, but have historically required greater initial investments and may depreciate
rapidly; nodes in a loosely-coupled system are usually inexpensive commodity computers
and can be recycled as independent machines upon retirement from the cluster.
Power consumption is also a consideration. Tightly-coupled systems tend to be much
more energy efficient than clusters. This is because considerable economies can be realized
by designing components to work together from the beginning in tightly-coupled systems,
whereas loosely-coupled systems use components that were not necessarily intended
specifically for use in such systems.
11.3.3 SISD multiprocessing
All processors of 8-bit and 16-bit instruction set are SISD mutliprocessors.
72
11.3.4 SIMD multiprocessing
SIMD multiprocessing is well suited to parallel or vector processing, in which a very
large set of data can be divided into parts that are individually subjected to identical but
independent operations. A single instruction stream directs the operation of multiple
processing units to perform the same manipulations simultaneously on potentially large
amounts of data.
For certain types of computing applications, this type of architecture can produce
enormous increases in performance, in terms of the elapsed time required to complete a given
task. However, a drawback to this architecture is that a large part of the system falls idle
when applications or system tasks are executed that cannot be divided into units that can be
processed in parallel.
Additionally, applications must be carefully and specially written to take maximum
advantage of the architecture, and often special optimizing compilers designed to produce
code specifically for this environment must be used. Some compilers in this category provide
special constructs or extensions to allow programmers to directly specify operations to be
performed in parallel (e.g., DO FOR ALL statements in the version of FORTRAN used on
the ILLIAC IV, which was a SIMD multiprocessing supercomputer).
SIMD multiprocessing finds wide use in certain domains such as computer simulation,
but is of little use in general-purpose desktop and business computing environments.
11.3.5 MISD multiprocessing
MISD multiprocessing offers mainly the advantage of redundancy, since multiple
processing units perform the same tasks on the same data, reducing the chances of incorrect
results if one of the units fails. MISD architectures may involve comparisons between
processing units to detect failures. Apart from the redundant and fail-safe character of this
type of multiprocessing, it has few advantages, and it is very expensive. It does not improve
performance. It can be implemented in a way that is transparent to software.
73
11.3.6 MIMD multiprocessing
MIMD multiprocessing architecture is suitable for a wide variety of tasks in which
completely independent and parallel execution of instructions touching different sets of data
can be put to productive use. For this reason, and because it is easy to implement, MIMD
predominates in multiprocessing.
Processing is divided into multiple threads, each with its own hardware processor state,
within a single software-defined process or within multiple processes. Insofar as a system has
multiple threads awaiting dispatch (either system or user threads), this architecture makes
good use of hardware resources.
MIMD does raise issues of deadlock and resource contention, however, since threads
may collide in their access to resources in an unpredictable way that is difficult to manage
efficiently. MIMD requires special coding in the operating system of a computer but does not
require application changes unless the applications themselves use multiple threads (MIMD
is transparent to single-threaded applications under most operating systems, if the
applications do not voluntarily relinquish control to the OS). Both system and user software
may need to use software constructs such as semaphores (also called locks or gates) to
prevent one thread from interfering with another if they should happen to cross paths in
referencing the same data. This gating or locking process increases code complexity, lowers
performance, and greatly increases the amount of testing required, although not usually
enough to negate the advantages of multiprocessing.
74
Similar conflicts can arise at the hardware level between CPUs (cache contention and
corruption, for example), and must usually be resolved in hardware, or with a combination of
software and hardware (e.g., cache-clear instructions).
11.4 Fault-tolerant
Fault-tolerant describes a computer system or component designed so that, in the
event that a component fails, a backup component or procedure can immediately take its
place with no loss of service. Fault tolerance can be provided with software, or embedded in
hardware, or provided by some combination.
In the software implementation, the operating system provides an interface that allows
a programmer to "checkpoint" critical data at pre-determined points within a transaction. In
the hardware implementation (for example, with Stratus and its VOS operating system), the
programmer does not need to be aware of the fault-tolerant capablilities of the machine.
At a hardware level, fault tolerance is achieved by duplexing each hardware
component. Disks are mirrored. Multiple processors are "lock-stepped" together and their
outputs are compared for correctness. When an anomaly occurs, the faulty component is
determined and taken out of service, but the machine continues to function as usual.
Fault-tolerance or graceful degradation is the property that enables a system (often
computer-based) to continue operating properly in the event of the failure of (or one or more
faults within) some of its components. If its operating quality decreases at all, the decrease is
proportional to the severity of the failure, as compared to a naively-designed system in which
even a small failure can cause total breakdown. Fault-tolerance is particularly sought-after in
high-availability or life-critical systems.
Fault-tolerance is not just a property of individual machines; it may also characterise
the rules by which they interact. For example, the Transmission Control Protocol (TCP) is
designed to allow reliable two-way communication in a packet-switched network, even in the
presence of communications links which are imperfect or overloaded. It does this by
requiring the endpoints of the communication to expect packet loss, duplication, reordering
and corruption, so that these conditions do not damage data integrity, and only reduce
throughput by a proportional amount.
Data formats may also be designed to degrade gracefully. HTML for example, is
designed to be forward compatible, allowing new HTML entities to be ignored by Web
browsers which do not understand them without causing the document to be unusable.
Recovery from errors in fault-tolerant systems can be characterised as either rollforward or roll-back. When the system detects that it has made an error, roll-forward recovery
takes the system state at that time and corrects it, to be able to move forward. Roll-back
recovery reverts the system state back to some earlier, correct version, for example using
checkpointing, and moves forward from there. Roll-back recovery requires that the
operations between the checkpoint and the detected erroneous state can be made idempotent.
Some systems make use of both roll-forward and roll-back recovery for different errors or
different parts of one error.
75
Within the scope of an individual system, fault-tolerance can be achieved by
anticipating exceptional conditions and building the system to cope with them, and, in
general, aiming for self-stabilization so that the system converges towards an error-free state.
However, if the consequences of a system failure are catastrophic, or the cost of making it
sufficiently reliable is very high, a better solution may be to use some form of duplication. In
any case, if the consequence of a system failure is catastrophic, the system must be able to
use reversion to fall back to a safe mode. This is similar to roll-back recovery but can be a
human action if humans are present in the loop.
11.4.1 Fault Tolerance Requirements
The basic characteristics of fault tolerance require:
1. No single point of failure
2. No single point of repair
3. Fault isolation to the failing component
4. Fault containment to prevent propagation of the failure
5. Availability of reversion modes
In addition, fault tolerant systems are characterized in terms of both planned service
outages and unplanned service outages. These are usually measured at the application level
and not just at a hardware level. The figure of merit is called availability and is expressed as a
percentage. A five nines system would therefore statistically provide 99.999% availability.
11.4.2 Fault-tolerance by replication, Redundancy and Diversity
Spare components addresses the first fundamental characteristic of fault-tolerance in
three ways:
Replication: Providing multiple identical instances of the same system or subsystem,
directing tasks or requests to all of them in parallel, and choosing the correct result on the
basis of a quorum;
Redundancy: Providing multiple identical instances of the same system and switching
to one of the remaining instances in case of a failure (failover);
Diversity: Providing multiple different implementations of the same specification, and
using them like replicated systems to cope with errors in a specific implementation.
A redundant array of independent disks (RAID) is an example of a fault-tolerant
storage device that uses data redundancy.
A lockstep fault-tolerant machine uses replicated elements operating in parallel. At
any time, all the replications of each element should be in the same state. The same inputs are
provided to each replication, and the same outputs are expected. The outputs of the
replications are compared using a voting circuit. A machine with two replications of each
element is termed Dual Modular Redundant (DMR). The voting circuit can then only detect a
76
mismatch and recovery relies on other methods. A machine with three replications of each
element is termed Triple Modular Redundancy (TMR). The voting circuit can determine
which replication is in error when a two-to-one vote is observed. In this case, the voting
circuit can output the correct result, and discard the erroneous version. After this, the internal
state of the erroneous replication is assumed to be different from that of the other two, and the
voting circuit can switch to a DMR mode. This model can be applied to any larger number of
replications.
Lockstep fault tolerant machines are most easily made fully synchronous, with each
gate of each replication making the same state transition on the same edge of the clock, and
the clocks to the replications being exactly in phase. However, it is possible to build lockstep
systems without this requirement.
Bringing the replications into synchrony requires making their internal stored states
the same. They can be started from a fixed initial state, such as the reset state. Alternatively,
the internal state of one replica can be copied to another replica.
One variant of DMR is pair-and-spare. Two replicated elements operate in lockstep as
a pair, with a voting circuit that detects any mismatch between their operations and outputs a
signal indicating that there is an error. Another pair operates exactly the same way. A final
circuit selects the output of the pair that does not proclaim that it is in error. Pair-and-spare
requires four replicas rather than the three of TMR, but has been used commercially.
11.5 Let us sum up
In this lesson we have learnt about
a) the multiprocessing techniques
b) and fault tolerance
11.6 Points for discussion
Discuss the following
a) SISD
b) SIMD
c) MISD
d) MIMD
11.7 Model answers to check your progress
The questions given in section 11.7 are explained here.
In computing, SISD (Single Instruction, Single Data) is a term referring to an
architecture in which a single processor executes a single instruction stream, to operate on
data stored in a single memory. Corresponds to the von Neumann architecture.
In computing, SIMD (Single Instruction, Multiple Data) is a technique employed to
achieve data level parallelism, like with vector processor. First made popular in large-scale
77
supercomputers (contrary to MIMD parallelization), smaller-scale SIMD operations have
now become widespread in personal computer hardware.
In computing, MISD (Multiple Instruction, Single Data) is a type of parallel
computing architecture where many functional units perform different operations on the same
data. Pipeline architectures belong to this type, though a purist might say that the data is
different after processing by each stage in the pipeline. Fault-tolerant computers executing
the same instructions redundantly in order to detect and mask errors, in a manner known as
task replication, may] be considered to belong to this type. Not many instantiations of this
architecture exist, as MIMD and SIMD are often more appropriate for common data parallel
techniques. Specifically, they allow better scaling and use of computational resources than
MISD does.
In computing, MIMD (Multiple Instruction stream, Multiple Data stream) is a
technique employed to achieve parallelism. Machines using MIMD have a number of
processors that function asynchronously and independently. At any time, different processors
may be executing different instructions on different pieces of data. MIMD architectures may
be used in a number of application areas such as computer-aided design/computer-aided
manufacturing, simulation, modeling, and as communication switches. MIMD machines can
be of either shared memory or distributed memory categories. These classifications are based
on how MIMD processors access memory. Shared memory machines may be of the busbased, extended, or hierarchical type. Distributed memory machines may have hypercube or
mesh interconnection schemes.
11.8 Lesson - end activities
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
b) Discuss about various multiprocessing techniques
c) Discuss about fault tolerance
11.9. References
a) Charles Crowley, Chapter 5, 6 of “Operating Systems – A Design-Oriented Approach”,
Tata McGraw-Hill, 2001
b) H.M. Deitel, Chapter 11 of “Operating Systems”, Second Edition, Pearson Education,
2001
c) Andrew S. Tanenbaum, Chapter 9, 10, 11, 12, 13 of “Modern Operating Systems”, PHI,
1996
d) D.M. Dhamdhere, Chapter 18, 19 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
78
UNIT – IV
LESSON – 12: DEVICE AND DISK MANAGEMENT
CONTENTS
12.1 Aims and Objectives
12.2 Introduction
12.3 Need for Disk Scheduling
12.4 Disk Scheduling Strategies
12.4.1
First Come First Served (FCFS)
12.4.2
Shortest Seek Time First (SSTF)
12.4.3
SCAN
12.4.4
Circular SCAN (C-SCAN)
12.5 RAM(Random access memory)
12.5.1
Overview
12.5.2
Recent developments
12.5.3
Memory wall
12.5.4
DRAM packaging
12.6 Optical Disks
12.7 Let us sum up
12.8
Points for discussion
12.9
Model answers to Check your Progress
12.10 Lesson - end activities
12.11 References
2.2 Aims and Objectives
In this lesson we will learn about the introduction to Disk Scheduling Strategies,
RAM and optical disks.
The objective of this lesson is to make sure that the student understands the following.
a) Disk Scheduling,
b) First Come First Served (FCFS),
c) Shortest Seek Time First (SSTF)
d) SCAN
e) Circular SCAN (C-SCAN)
79
f) RAM and
g) Optical Disks
12.2 Introduction
In multiprogramming systems several different processes may want to use the
system's resources simultaneously. For example, processes will contend to access an auxiliary
storage device such as a disk. The disk drive needs some mechanism to resolve this
contention, sharing the resource between the processes fairly and efficiently.
A magnetic disk consists of a collection of platters which rotate on about a central
spindle. These platters are metal disks covered with magnetic recording material on both
sides. Each disk surface is divided into concentric circles called tracks. Disk divides each
track into sectors, each typically contains 512 bytes. While reading and writing the head
moves over the surface of the platters until it finds the track and sector it requires. This is like
finding someone's home by first finding the street (track) and then the particular house
number (sector). There is one head for each surface on which information is stored each on
its own arm. In most systems the arms are connected together so that the heads move in
unison, so that each head is over the same track on each surface. The term cylinder refers to
the collection of all tracks which are under the heads at any time.
In order to satisfy an I/O request the disk controller must first move the head to the
correct track and sector. Moving the head between cylinders takes a relatively long time so in
order to maximize the number of I/O requests which can be satisfied the scheduling policy
should try to minimize the movement of the head. On the other hand, minimizing head
movement by always satisfying the request of the closest location may mean that some
requests have to wait a long time. Thus, there is a trade-off between throughput (the average
number of requests satisfied in unit time) and response time (the average time between a
request arriving and it being satisfied).
12.3 Need for Disk Scheduling
Access time has two major components namely seek time and rotational latency. Seek
time is the time for the disk are to move the heads to the cylinder containing the desired
sector. Rotational latency is the additional time waiting for the disk to rotate the desired
80
sector to the disk head. In order to have fast access time we have to minimize the seek time
which is approximately equal to the seek distance.
Disk bandwidth is the total number of bytes transferred, divided by the total time
between the first request for service and the completion of the last transfer. The operating
system is responsible for using hardware efficiently for the disk drives, to have a fast access
time and disk bandwidth. This in turn needs a good disk scheduling.
12.4 Disk Scheduling Strategies
Three criteria to measure strategies are throughput, mean response time, and variance
of response times. Throughput is the number of requests serviced per unit of time. Mean
response time is the average time spent waiting for request to be serviced. Variance of
response times is the measure of the predictability of response times. Hence the overall goals
of the disk scheduling strategies are to maximize the throughput and minimize both response
time and variance of response time.
12.4.1 First Come First Served (FCFS)
The disk controller processes the I/O requests in the order in which they arrive, thus
moving backwards and forwards across the surface of the disk to get to the next requested
location each time. Since no reordering of request takes place the head may move almost
randomly across the surface of the disk. This policy aims to minimize response time with
little regard for throughput.
Figure illustrates this method. The figure depicts requests for 63, 33, 72, 47, 8, 99, 74,
52, 75. If the requests have arrived in the sequence, they are also serviced in that sequence,
causing the head movement as shown in the figure.
FCFS is a 'just' algorithm, because, the process to make a request first is served first,
but it may not be the best in terms of reducing the head movement, as is clear from the figure.
81
Seek pattern under the FCFS strategy.
Disk request pattern.
First-come-first-served (FCFS) scheduling has major drawbacks namely, (i) Seeking
to randomly distributed locations results in long waiting times, (ii) Under heavy loads, system
can become overwhelmed, (iii) Requests must be serviced in logical order to minimize
delays, (iv) and Service requests with least mechanical motion
12.4.2 Shortest Seek Time First (SSTF)
Each time an I/O request has been completed the disk controller selects the waiting
request whose sector location is closest to the current position of the head. The movement
across the surface of the disk is still apparently random but the time spent in movement is
minimized. This policy will have better throughput than FCFS but a request may be delayed
for a long period if many closely located requests arrive just after it.
82
Seek pattern under the SSTF strategy
Advantages of SSTF are higher throughput and lower response times than FCFS and
it is a reasonable solution for batch processing systems. The disadvantages of SSTF are (i) it
does does not ensure fairness, (ii) there are possibility of indefinite postponement, (iii) there
will be high variance of response times and (iv) the response time generally will be
unacceptable for interactive systems.
12.4.3 SCAN
The drive head sweeps across the entire surface of the disk, visiting the outermost
cylinders before changing direction and sweeping back to the innermost cylinders. It selects
the next waiting requests whose location it will reach on its path backwards and forwards
across the disk. Thus, the movement time should be less than FCFS but the policy is clearly
fairer than SSTF.
12.4.4 Circular SCAN (C-SCAN)
C-SCAN is similar to SCAN but I/O requests are only satisfied when the drive head is
traveling in one direction across the surface of the disk. The head sweeps from the innermost
cylinder to the outermost cylinder satisfying the waiting requests in order of their locations.
When it reaches the outermost cylinder it sweeps back to the innermost cylinder without
satisfying any requests and then starts again.
12.5
RAM(Random access memory)
Random access memory (usually known by its acronym, RAM) is a type of data
storage used in computers. It takes the form of integrated circuits that access the stored data
to be accessed in any order — that is, at random and without the physical movement of the
storage medium or a physical reading head. RAM is a volatile memory as the information or
instructions stored in it will be lost if the power is switched off.
The word "random" refers to the fact that any piece of data can be returned in a
constant time, regardless of its physical location and whether or not it is related to the
previous piece of data. This contrasts with storage mechanisms such as tapes, magnetic discs
and optical discs, which rely on the physical movement of the recording medium or a reading
83
head. In these devices, the movement takes longer than the data transfer, and the retrieval
time varies depending on the physical location of the next item.
A 1GB DDR RAM memory
RAM is usually writable as well as readable, so "RAM" is often used interchangeably
with "read-write memory". The alternative to this is "ROM", or Read Only Memory. Most
types of RAM lose their data when the computer powers down. "Flash memory" is a
ROM/RAM hybrid that can be written to, but which does not require power to maintain its
contents. RAM is not strictly the opposite of ROM, however. The word random indicates a
contrast with serial access or sequential access memory.
"Random access" is also the name of an indexing method: hence, disk storage is often
called "random access" because the reading head can move relatively quickly from one piece
of data to another, and does not have to read all the data in between. However the final "M" is
crucial: "RAM" (provided there is no additional term as in "DVD-RAM") always refers to a
solid-state device.
Many CPU-based designs actually have a memory hierarchy consisting of registers,
on-die SRAM caches, DRAM, paging systems, and virtual memory or swap space on a harddrive. This entire pool of memory may be referred to as "RAM" by many developers, even
though the various subsystems can have very different access times, violating the original
concept behind the "random access" term in RAM. Even within a hierarchy level such as
DRAM, the specific row/column/bank/rank/channel/interleave organization of the
components make the access time variable, although not to the extent that rotating storage
media or a tape is variable.
12.5.1 Overview
The key benefit of RAM over types of storage which require physical movement is
that retrieval times are short and consistent. Short because no physical movement is
necessary, and consistent because the time taken to retrieve a piece of data does not depend
on its current distance from a physical head; it requires practically the same amount of time to
access any piece of data stored in a RAM chip. Most other technologies have inherent delays
for reading a particular bit or byte. The disadvantage of RAM over physically moving media
is cost, and the loss of data when power is turned off.
Because of this speed and consistency, RAM is used as 'main memory' or primary
storage: the working area used for loading, displaying and manipulating applications and
data. In most personal computers, the RAM is not an integral part of the motherboard or
84
CPU—it comes in the easily upgraded form of modules called memory sticks or RAM sticks
about the size of a few sticks of chewing gum. These can quickly be removed and replaced. A
smaller amount of random-access memory is also integrated with the CPU, but this is usually
referred to as "cache" memory, rather than RAM.
Modern RAM generally stores a bit of data as either a charge in a capacitor, as in
dynamic RAM, or the state of a flip-flop, as in static RAM. Some types of RAM can detect or
correct random faults called memory errors in the stored data, using RAM parity and error
correction codes.
Many types of RAM are volatile, which means that unlike some other forms of
computer storage such as disk storage and tape storage, they lose all data when the computer
is powered down. For these reasons, nearly all PCs use disks as "secondary storage".
Software can "partition" a portion of a computer's RAM, allowing it to act as a much
faster hard drive that is called a RAM disk. Unless the memory used is non-volatile, a RAM
disk loses the stored data when the computer is shut down. However, volatile memory can
retain its data when the computer is shut down if it has a separate power source, usually a
battery.
If a computer becomes low on RAM during intensive application cycles, the computer
can resort to so-called virtual memory. In this case, the computer temporarily uses hard drive
space as additional memory. Constantly relying on this type of backup memory it is called
thrashing, which is generally undesirable, as virtual memory lacks the advantages of RAM. In
order to reduce the dependency on virtual memory, more RAM can be installed.
12.5.2 Recent developments
Currently, several types of non-volatile RAM are under development, which will
preserve data while powered down. The technologies used include carbon nanotubes and the
magnetic tunnel effect.
In summer 2003, a 128 KB magnetic RAM chip manufactured with 0.18 µm
technology was introduced. The core technology of MRAM is based on the magnetic tunnel
effect. In June 2004, Infineon Technologies unveiled a 16 MB prototype again based on 0.18
µm technology.
Nantero built a functioning carbon nanotube memory prototype 10 GB array in 2004.
In 2006, Solid state memory came of age, especially when implemented as "Solid
state disks", with capacities exceeding 150 gigabytes and speeds far exceeding traditional
disks. This development has started to blur the definition between traditional random access
memory and disks, dramatically reducing the difference in performance.
12.5.3 Memory wall
The "memory wall" is the growing disparity of speed between CPU and memory
outside the CPU chip. An important reason of this disparity is the limited communication
bandwidth beyond chip boundaries. From 1986 to 2000, CPU speed improved at an annual
rate of 55% while memory speed only improved at 10%. Given these trends, it was expected
85
that memory latency would become an overwhelming bottleneck in computer performance.
Currently, CPU speed improvements have slowed significantly partly due to major physical
barriers and partly because current CPU designs have already hit the memory wall in some
sense. Intel summarized these causes in their Platform 2015 documentation
“First of all, as chip geometries shrink and clock frequencies rise, the transistor
leakage current increases, leading to excess power consumption and heat (more on power
consumption below). Secondly, the advantages of higher clock speeds are in part negated by
memory latency, since memory access times have not been able to keep pace with increasing
clock frequencies. Third, for certain applications, traditional serial architectures are becoming
less efficient as processors get faster (due to the so-called Von Neumann bottleneck), further
undercutting any gains that frequency increases might otherwise buy. In addition, resistancecapacitance (RC) delays in signal transmission are growing as feature sizes shrink, imposing
an additional bottleneck that frequency increases don't address.”
The RC delays in signal transmission were also noted in Clock Rate versus IPC: The
End of the Road for Conventional Microarchitectures which projects a maximum of 12.5%
average annual CPU performance improvement between 2000 and 2014. The data on Intel
Processors clearly shows a slowdown in performance improvements in recent processors.
However, Intel's new processors, Core 2 Duo (codenamed Conroe) show a significant
improvement over previous Pentium 4 processors; due to a more efficient architecture,
performance increased while clock rate actually decreased.
12.5.4 DRAM packaging
For economic reasons, the large (main) memories found in personal computers,
workstations, and non-handheld game-consoles (such as Playstation and Xbox) normally
consists of dynamic RAM (DRAM). Other parts of the computer, such as cache memories
and data buffers in hard disks, normally use static RAM (SRAM).
12.6 Optical Disks
An optical disc is an electronic data storage medium that can be written to and read
using a low-powered laser beam. Originally developed in the late 1960s, the first optical disc,
created by James T. Russell, stored data as micron-wide dots of light and dark. A laser read
the dots, and the data was converted to an electrical signal, and finally to audio or visual
output. However, the technology didn't appear in the marketplace until Philips and Sony
came out with the compact disc (CD) in 1982. Since then, there has been a constant
succession of optical disc formats, first in CD formats, followed by a number of DVD
formats.
Optical disc offers a number of advantages over magnetic storage media. An optical
disc holds much more data. The greater control and focus possible with laser beams (in
comparison to tiny magnetic heads) means that more data can be written into a smaller space.
Storage capacity increases with each new generation of optical media. Emerging standards,
such as Blu-ray, offer up to 27 gigabytes (GB) on a single-sided 12-centimeter disc. In
comparison, a diskette, for example, can hold 1.44 megabytes (MB). Optical discs are
inexpensive to manufacture and data stored on them is relatively impervious to most
environmental threats, such as power surges, or magnetic disturbances.
86
Optical disc offers a number of advantages over magnetic storage media. An optical
disc holds much more data. The greater control and focus possible with laser beams (in
comparison to tiny magnetic heads) means that more data can be written into a smaller space.
Storage capacity increases with each new generation.
12.7 Let us sum up
In this lesson we have learnt about
a) the various disk scheduling strategies
b) the Random access memory
c) and the optical disk
12.8
Points for discussion
a) Compare disk scheduling algorithms
12.9 Model answers to Check your Progress
The answer for the question given in section 12.9 is discussed here for helping the
students to check their progress.
 FCFS (First Come, First Served)
o perform operations in order requested
o no reordering of work queue
o no starvation: every request is serviced
o poor performance
 SSTF (Shortest Seek Time First)
o after a request, go to the closest request in the work queue, regardless of
direction
o reduces total seek time compared to FCFS
o Disadvantages
 starvation is possible; stay in one area of the disk if very busy
 switching directions slows things down
 SCAN
o go from the outside to the inside servicing requests and then back from the
outside to the inside servicing requests.
o repeats this over and over.
o reduces variance compared to SSTF.
 LOOK
o like SCAN but stops moving inwards (or outwards) when no more requests in
that direction exist.
 C-SCAN (circular scan)
o moves inwards servicing requests until it reaches the innermost cylinder; then
jumps to the outside cylinder of the disk without servicing any requests.
o repeats this over and over.
o variant: service requests from inside to outside, and then skip back to the
innermost cylinder.
 C-LOOK
o moves inwards servicing requests until there are no more requests in that
direction, then it jumps to the outermost outstanding requests.
87
o repeast this over and over.
o variant: service requests from inside to outside, then skip back to the
innermost request.
12.10 Lesson - end activities
After learning this chapter, try to discuss among your friends and answer these questions to
check your progress.
a)
b)
c)
d)
e)
f)
Discuss about various disk scheduling strategies
Explain about FCFS
Explain about SSTF
Explain about SCAN
Discuss about RAM
Discuss about Optical storages
12.11 References
 Charles Crowley, Chapter 15 of “Operating Systems – A Design-Oriented Approach”,
Tata McGraw-Hill, 2001
 H.M. Deitel, Chapter 12 of “Operating Systems”, Second Edition, Pearson Education,
2001
 Andrew S. Tanenbaum, Chapter 5 of “Modern Operating Systems”, PHI, 1996
 D.M. Dhamdhere, Chapter 16, 17 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
88
LESSON – 13: FILE SYSTEMS AND ORGANIZATION
CONTENTS
13.1 Aims and Objectives
13.2 Introduction
13.2.1
Aspects of file systems
13.3 Types of file systems
13.3.1
Disk file systems
13.3.2
Flash file systems
13.3.3
Database file systems
13.3.4
Transactional file systems
13.3.5
Network file systems
13.3.6
Special purpose file systems
13.3.7
Flat file systems
13.4 File systems and operating systems
13.5 Multiple Filesystem Support
13.6 Organization
13.7 Examples of file systems
13.7.1
File systems under Unix and Linux systems
13.7.2
File systems under Mac OS X
13.7.3
File systems under Plan 9 from Bell Labs
13.7.4
File systems under Microsoft Windows
13.8 Let us sum up
13.9 Points for Discussion
13.10 Model Answers to Check your Progress
13.11 Lesson - end activities
13.12 References
13.1 Aims and Objectives
In this lesson we will learn the basics, history, versions and environment of DOS. This
lesson also covers the basic definitions and history of UNIX.
The objectives of this lesson is to make the student aware of the basic concepts and
history of both DOS and UNIX
89
13.1 Introduction
In computing, a file system (often also written as filesystem) is a method for storing
and organizing computer files and the data they contain to make it easy to find and access
them. File systems may use a data storage device such as a hard disk or CD-ROM and
involve maintaining the physical location of the files, they might provide access to data on a
file server by acting as clients for a network protocol (e.g., NFS, SMB, or 9P clients), or they
may be virtual and exist only as an access method for virtual data.
More formally, a file system is a set of abstract data types that are implemented for
the storage, hierarchical organization, manipulation, navigation, access, and retrieval of data.
File systems share much in common with database technology, but it is debatable whether a
file system can be classified as a special-purpose database (DBMS).
1.3.2
Aspects of file systems
The most familiar file systems make use of an underlying data storage device that
offers access to an array of fixed-size blocks, sometimes called sectors, generally 512 bytes
each. The file system software is responsible for organizing these sectors into files and
directories, and keeping track of which sectors belong to which file and which are not being
used.
However, file systems need not make use of a storage device at all. A file system can
be used to organize and represent access to any data, whether it be stored or dynamically
generated (eg, from a network connection).
Whether the file system has an underlying storage device or not, file systems typically
have directories which associate file names with files, usually by connecting the file name to
an index into a file allocation table of some sort, such as the FAT in an MS-DOS file system,
or an inode in a Unix-like file system. Directory structures may be flat, or allow hierarchies
where directories may contain subdirectories. In some file systems, file names are structured,
with special syntax for filename extensions and version numbers. In others, file names are
simple strings, and per-file metadata is stored elsewhere.
Other bookkeeping information is typically associated with each file within a file
system. The length of the data contained in a file may be stored as the number of blocks
allocated for the file or as an exact byte count. The time that the file was last modified may be
stored as the file's timestamp. Some file systems also store the file creation time, the time it
was last accessed, and the time that the file's meta-data was changed. (Note that many early
PC operating systems did not keep track of file times.) Other information can include the
file's device type (e.g., block, character, socket, subdirectory, etc.), its owner user-ID and
group-ID, and its access permission settings (e.g., whether the file is read-only, executable,
etc.).
The hierarchical file system was an early research interest of Dennis Ritchie of Unix
fame; previous implementations were restricted to only a few levels, notably the IBM
implementations, even of their early databases like IMS. After the success of Unix, Ritchie
extended the file system concept to every object in his later operating system developments,
such as Plan 9 and Inferno.
90
Traditional file systems offer facilities to create, move and delete both files and
directories. They lack facilities to create additional links to a directory (hard links in Unix),
rename parent links (".." in Unix-like OS), and create bidirectional links to files.
Traditional file systems also offer facilities to truncate, append to, create, move, delete
and in-place modify files. They do not offer facilities to prepend to or truncate from the
beginning of a file, let alone arbitrary insertion into or deletion from a file. The operations
provided are highly asymmetric and lack the generality to be useful in unexpected contexts.
For example, interprocess pipes in Unix have to be implemented outside of the file system
because the pipes concept does not offer truncation from the beginning of files.
Secure access to basic file system operations can be based on a scheme of access
control lists or capabilities. Research has shown access control lists to be difficult to secure
properly, which is why research operating systems tend to use capabilities. Commercial file
systems still use access control lists.
Arbitrary attributes can be associated on advanced file systems, such as XFS,
ext2/ext3, some versions of UFS, and HFS+, using extended file attributes. This feature is
implemented in the kernels of Linux, FreeBSD and Mac OS X operating systems, and allows
metadata to be associated with the file at the file system level. This, for example, could be the
author of a document, the character encoding of a plain-text document, or a checksum.
1.4
Types of file systems
File system types can be classified into disk file systems, network file systems and
special purpose file systems.
1.4.1
Disk file systems
A disk file system is a file system designed for the storage of files on a data storage
device, most commonly a disk drive, which might be directly or indirectly connected to the
computer. Examples of disk file systems include FAT, FAT32, NTFS, HFS and HFS+, ext2,
ext3, ISO 9660, ODS-5, and UDF. Some disk file systems are journaling file systems or
versioning file systems.
1.4.2
Flash file systems
A flash file system is a file system designed for storing files on flash memory devices.
These are becoming more prevalent as the number of mobile devices is increasing, and the
capacity of flash memories catches up with hard drives.
While a block device layer can run emulate hard drive behavior and store regular file
systems on a flash device, this is suboptimal for several reasons:
Erasing blocks: Flash memory blocks have to be explicitly erased before they can be
written to. The time taken to erase blocks can be significant, thus it is beneficial to erase
unused blocks while the device is idle.
91
Random access: Disk file systems are optimized to avoid disk seeks whenever
possible, due to the high cost of seeking. Flash memory devices impose no seek latency.
Wear levelling: Flash memory devices tend to "wear out" when a single block is
repeatedly overwritten; flash file systems try to spread out writes as evenly as possible.
It turns out that log-structured file systems have all the desirable properties for a flash
file system. Such file systems include JFFS2 and YAFFS.
1.4.3
Database file systems
A new concept for file management is the concept of a database-based file system.
Instead of, or in addition to, hierarchical structured management, files are identified by their
characteristics, like type of file, topic, author, or similar metadata.
1.4.4
Transactional file systems
This is a special kind of file system in that it logs events or transactions to files. Each
operation that you do may involve changes to a number of different files and disk structures.
In many cases, these changes are related, meaning that it is important that they all be
executed at the same time. Take for example a bank sending another bank some money
electronically. The bank's computer will "send" the transfer instruction to the other bank and
also update its own records to indicate the transfer has occurred. If for some reason the
computer crashes before it has had a chance to update its own records, then on reset, there
will be no record of the transfer but the bank will be missing some money. A transactional
system can rebuild the actions by resynchronizing the "transactions" on both ends to correct
the failure. All transactions can be saved as well, providing a complete record of what was
done and where. This type of file system is designed and intended to be fault tolerant, and
necessarily incurs a high degree of overhead.
92
1.4.5
Network file systems
A network file system is a file system that acts as a client for a remote file access
protocol, providing access to files on a server. Examples of network file systems include
clients for the NFS, SMB protocols, and file-system-like clients for FTP and WebDAV.
1.4.6
Special purpose file systems
A special purpose file system is basically any file system that is not a disk file system or
network file system. This includes systems where the files are arranged dynamically by
software, intended for such purposes as communication between computer processes or
temporary file space.
Special purpose file systems are most commonly used by file-centric operating systems
such as Unix. Examples include the procfs (/proc) file system used by some Unix variants,
which grants access to information about processes and other operating system features.
Deep space science exploration craft, like Voyager I & II used digital tape based special
file systems. Most modern space exploration craft like Cassini-Huygens used Real-time
operating system file systems or RTOS influenced file systems. The Mars Rovers are one
such example of an RTOS file system, important in this case because they are implemented in
flash memory.
1.4.7
Flat file systems
In a flat file system, there are no subdirectories—everything is stored at the same (root)
level on the media, be it a hard disk, floppy disk, etc. While simple, this system rapidly
becomes inefficient as the number of files grows, and makes it difficult for users to organise
data into related groups.
Like many small systems before it, the original Apple Macintosh featured a flat file
system, called Macintosh File System. Its version of Mac OS was unusual in that the file
management software (Macintosh Finder) created the illusion of a partially hierarchical filing
system on top of MFS. This structure meant that every file on a disk had to have a unique
name, even if it appeared to be in a separate folder. MFS was quickly replaced with
Hierarchical File System, which supported real directories.
1.5
File systems and operating systems
Most operating systems provide a file system, and is an integral part of any modern
operating system. Early microcomputer operating systems' only real task was file
management — a fact reflected in their names (see DOS). Some early operating systems had
a separate component for handling file systems which was called a disk operating system. On
some microcomputers, the disk operating system was loaded separately from the rest of the
operating system. On early operating systems, there was usually support for only one, native,
unnamed file system; for example, CP/M supports only its own file system, which might be
called "CP/M file system" if needed, but which didn't bear any official name at all.
Because of this, there needs to be an interface provided by the operating system software
between the user and the file system. This interface can be textual (such as provided by a
93
command line interface, such as the Unix shell, or OpenVMS DCL) or graphical (such as
provided by a graphical user interface, such as file browsers). If graphical, the metaphor of
the folder, containing documents, other files, and nested folders is often used (see also:
directory and folder).
1.6
Multiple Filesystem Support
With the expansion of network computing, it became desirable to support both local
and remote filesystems. To simplify the support of multiple filesystems, the developers added
a new virtual node or vnode interface to the kernel. The set of operations exported from the
vnode interface appear much like the filesystem operations previously supported by the local
filesystem. However, they may be supported by a wide range of filesystem types:




Local disk-based filesystems
Files imported using a variety of remote filesystem protocols
Read-only CD-ROM filesystems
Filesystems providing special-purpose interfaces - for example, the /proc filesystem
A few variants of 4.4BSD, such as FreeBSD, allow filesystems to be loaded
dynamically when the filesystems are first referenced by the mount system call.
1.7
Organization
1. A file is organized logically as a sequence of records.
2. Records are mapped onto disk blocks.
3. Files are provided as a basic construct in operating systems, so we assume the
existence of an underlying file system.
4. Blocks are of a fixed size determined by the operating system.
5. Record sizes vary.
6. In relational database, tuples of distinct relations may be of different sizes.
7. One approach to mapping database to files is to store records of one length in a
given file.
8. An alternative is to structure files to accommodate variable-length records. (Fixedlength is easier to implement.)
94
1.8 Examples of file systems
1.8.1
File systems under Unix and Linux systems
Unix operating systems create a virtual file system, which makes all the files on all
the devices appear to exist in a single hierarchy. This means, in Unix, there is one root
directory, and every file existing on the system is located under it somewhere. Furthermore,
the Unix root directory does not have to be in any physical place. It might not be on your first
hard drive - it might not even be on your computer. Unix can use a network shared resource
as its root directory.
Unix assigns a device name to each device, but this is not how the files on that device
are accessed. Instead, to gain access to files on another device, you must first inform the
operating system where in the directory tree you would like those files to appear. This
process is called mounting a file system. For example, to access the files on a CD-ROM, one
must tell the operating system "Take the file system from this CD-ROM and make it appear
under such-and-such directory". The directory given to the operating system is called the
mount point - it might, for example, be /media. The /media directory exists on many Unix
systems (as specified in the Filesystem Hierarchy Standard) and is intended specifically for
use as a mount point for removable media such as CDs, DVDs and like floppy disks. It may
be empty, or it may contain subdirectories for mounting individual devices. Generally, only
the administrator (i.e. root user) may authorize the mounting of file systems.
Unix-like operating systems often include software and tools that assist in the
mounting process and provide it new functionality. Some of these strategies have been coined
"auto-mounting" as a reflection of their purpose.
1. In many situations, file systems other than the root need to be available as soon as the
operating system has booted. All Unix-like systems therefore provide a facility for
mounting file systems at boot time. System administrators define these file systems in
the configuration file fstab, which also indicates options and mount points.
2. In some situations, there is no need to mount certain file systems at boot time,
although their use may be desired thereafter. There are some utilities for Unix-like
systems that allow the mounting of predefined file systems upon demand.
3. Removable media have become very common with microcomputer platforms. They
allow programs and data to be transferred between machines without a physical
connection. Common examples include USB flash drives, CD-ROMs and DVDs.
Utilities have therefore been developed to detect the presence and availability of a
medium and then mount that medium without any user intervention.
4. Progressive Unix-like systems have also introduced a concept called supermounting.
For example, a floppy disk that has been supermounted can be physically removed
from the system. Under normal circumstances, the disk should have been
synchronised and then unmounted before its removal. Provided synchronisation has
occurred, a different disk can be inserted into the drive. The system automatically
notices that the disk has changed and updates the mount point contents to reflect the
new medium. Similar functionality is found on standard Windows machines.
95
5. A similar innovation preferred by some users is the use of autofs, a system that, like
supermounting, eliminates the need for manual mounting commands. The difference
from supermount, other than compatibility in an apparent greater range of applications
such as access to file systems on network servers, is that devices are mounted
transparently when requests to their file systems are made, as would be appropriate
for file systems on network servers, rather than relying on events such as the insertion
of media, as would be appropriate for removable media.
1.8.2
File systems under Mac OS X
Mac OS X uses a file system that it inherited from Mac OS called HFS Plus. HFS
Plus is a metadata-rich and case preserving file system. Due to the Unix roots of Mac OS X,
Unix permissions were added to HFS Plus. Later versions of HFS Plus added journaling to
prevent corruption of the file system structure and introduced a number of optimizations to
the allocation algorithms in an attempt to defragment files automatically without requiring an
external defragmenter.
Filenames can be up to 255 characters. HFS Plus uses Unicode to store filenames. On
Mac OS X, the filetype can come from the type code stored in file's metadata or the filename.
HFS Plus has three kinds of links: Unix-style hard links, Unix-style symbolic links
and aliases. Aliases are designed to maintain a link to their original file even if they are
moved or renamed; they are not interpreted by the file system itself, but by the File Manager
code in userland.
Mac OS X also supports the UFS file system, derived from the BSD Unix Fast File
System via NeXTSTEP.
1.8.3
File systems under Plan 9 from Bell Labs
Plan 9 from Bell Labs was originally designed to extend some of Unix's good points,
and to introduce some new ideas of its own while fixing the shortcomings of Unix.
With respect to file systems, the Unix system of treating things as files was continued,
but in Plan 9, everything is treated as a file, and accessed as a file would be (i.e., no ioctl or
mmap). Perhaps surprisingly, while the file interface is made universal it is also simplified
considerably, for example symlinks, hard links and suid are made obsolete, and an atomic
create/open operation is introduced. More importantly the set of file operations becomes well
defined and subversions of this like ioctl are eliminated.
Secondly, the underlying 9P protocol was used to remove the difference between
local and remote files (except for a possible difference in latency). This has the advantage
that a device or devices, represented by files, on a remote computer could be used as though
it were the local computer's own device(s). This means that under Plan 9, multiple file servers
provide access to devices, classing them as file systems. Servers for "synthetic" file systems
can also run in user space bringing many of the advantages of micro kernel systems while
maintaining the simplicity of the system.
96
Everything on a Plan 9 system has an abstraction as a file; networking, graphics,
debugging, authentication, capabilities, encryption, and other services are accessed via I-O
operations on file descriptors. For example, this allows the use of the IP stack of a gateway
machine without need of NAT, or provides a network-transparent window system without the
need of any extra code.
Another example: a Plan-9 application receives FTP service by opening an FTP site.
The ftpfs server handles the open by essentially mounting the remote FTP site as part of the
local file system. With ftpfs as an intermediary, the application can now use the usual filesystem operations to access the FTP site as if it were part of the local file system. A further
example is the mail system which uses file servers that synthesize virtual files and directories
to represent a user mailbox as /mail/fs/mbox. The wikifs provides a file system interface to
a wiki.
These file systems are organized with the help of private, per-process namespaces,
allowing each process to have a different view of the many file systems that provide
resources in a distributed system.
1.8.4
File systems under Microsoft Windows
Windows makes use of the FAT and NTFS (New Technology File System) file
systems. The FAT (File Allocation Table) filing system, supported by all versions of
Microsoft Windows, was an evolution of that used in Microsoft's earlier operating system
(MS-DOS which in turn was based on 86-DOS). FAT ultimately traces its roots back to the
shortlived M-DOS project and Standalone disk BASIC before it. Over the years various
features have been added to it, inspired by similar features found on file systems used by
operating systems such as UNIX.
Older versions of the FAT file system (FAT12 and FAT16) had file name length
limits, a limit on the number of entries in the root directory of the file system and had
restrictions on the maximum size of FAT-formatted disks or partitions. Specifically, FAT12
and FAT16 had a limit of 8 characters for the file name, and 3 characters for the extension.
This is commonly referred to as the 8.3 filename limit. VFAT, which was an extension to
FAT12 and FAT16 introduced in Windows NT 3.5 and subsequently included in Windows
95, allowed long file names (LFN). FAT32 also addressed many of the limits in FAT12 and
FAT16, but remains limited compared to NTFS.
NTFS, introduced with the Windows NT operating system, allowed ACL-based
permission control. Hard links, multiple file streams, attribute indexing, quota tracking,
compression and mount-points for other file systems (called "junctions") are also supported,
though not all these features are well-documented.
Unlike many other operating systems, Windows uses a drive letter abstraction at the
user level to distinguish one disk or partition from another. For example, the path
C:\WINDOWS represents a directory WINDOWS on the partition represented by the letter C. The
C drive is most commonly used for the primary hard disk partition, on which Windows is
installed and from which it boots. This "tradition" has become so firmly ingrained that bugs
came about in older versions of Windows which made assumptions that the drive that the
operating system was installed on was C. The tradition of using "C" for the drive letter can be
traced to MS-DOS, where the letters A and B were reserved for up to two floppy disk drives;
97
in a common configuration, A would be the 3½-inch floppy drive, and B the 5¼-inch one.
Network drives may also be mapped to drive letters.
13.8 Let us sum up
In this lesson we have learnt about
a) the various types of file systems
b) the multiple file system support
c) and examples of file systems
13.9 Points for Discussion
a) Define file system
b) Define flat file system
13.10 Model Answers to Check your Progress
In order to check your progress, answer for the first question is given here
In computing, a file system (often also written as filesystem) is a method for storing and
organizing computer files and the data they contain to make it easy to find and access them.
File systems may use a data storage device such as a hard disk or CD-ROM and involve
maintaining the physical location of the files, they might provide access to data on a file
server by acting as clients for a network protocol (e.g., NFS, SMB, or 9P clients), or they
may be virtual and exist only as an access method for virtual data (e.g., procfs).
More formally, a file system is a set of abstract data types that are implemented for the
storage, hierarchical organization, manipulation, navigation, access, and retrieval of data. File
systems share much in common with database technology, but it is debatable whether a file
system can be classified as a special-purpose database (DBMS)
13.11 Lesson - end activities
After learning this chapter, try to discuss among your friends and answer these questions
to check your progress.
b) Discuss about various file systems
c) Discuss about UNIX and Linux file systems
d) Discuss about Microsoft Windows file systems
13.12 References
 Charles Crowley, Chapter 16, 17 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
 H.M. Deitel, Chapter 13 of “Operating Systems”, Second Edition, Pearson Education,
2001
 Andrew S. Tanenbaum, Chapter 4 of “Modern Operating Systems”, PHI, 1996
 D.M. Dhamdhere, Chapter 17 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
98
LESSON – 14: FILE ALLOCATION
CONTENTS
14.1 Aims and Objectives
14.2 Introduction
14.3 Free Space Management
14.4 Contiguous allocation
14.5 Linked allocation
14.6 Indexed allocation
14.7 Implementation issues
14.7.1
Memory allocation
14.7.2
Fixed Sized Allocation
14.7.3
Variable Size Allocation
14.7.4
Memory Allocation with PAGING
14.7.5
Memory Mapped files
14.7.6
Hardware support
14.7.7
Copy-on-Write
14.8
Let us sum up
14.9
Points for discussion
14.10 Model answer to check your progress
14.11 Lesson - end activities
14.12 References
14.1 Aims and Objectives
In this lesson we will learn about the basics of file allocation.
The objective of this lesson is to make the student aware of the basic concepts of the
following
a) Free Space Management,
b) Contiguous allocation, Linked allocation, and Indexed allocation
c) Implementation issues
99
14.2 Introduction
The main discussion here is how to allocate space to files so that disk space is effectively
utilized and files can be quickly accessed. Three major methods of allocating disk space are
in wide use and they are contiguous allocation, linked allocation and indexed allocation, each
one having its own advantages and disadvantages.
14.3 Free Space Management
To keep track of the free space, the file system maintains a free space list which records
all disk blocks which are free. We search the free space list to create a file for the required
amount of space and allocate it to the new file. This space is then removed from the free
space list. When a file is deleted, its disk space is added to the free space list.
Bit-Vector
Frequently, the free-space list is implemented as a bit map or bit vector. Each block is
represented by a 1 bit. If the block is free, the bit is 0; if the block is allocated, the bit is 1.
For example, consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26,
and 27 are free, and the rest of the blocks are allocated. The free-space bit map would be:
11000011000000111001111110001111…
The main advantage of this approach is that it is relatively simple and efficient to find n
consecutive free blocks on the disk. Unfortunately, bit vectors are inefficient unless the entire
vector is kept in memory for most accesses. Keeping it main memory is possible for smaller
disks such as on microcomputers, but not for larger ones.
Linked List
Another approach is to link all the free disk blocks together, keeping a pointer to the
first free block. This block contains a pointer to the next free disk block, and so on. In the
previous example, a pointer could be kept to block 2, as the first free block. Block 2 would
contain a pointer to block 3, which would point to block 4, which would point to block 5,
which would point to block 8, and so on. This scheme is not efficient; to traverse the list,
each block must be read, which requires substantial I/O time
Grouping
A modification of the free-list approach is to store the addresses of n free blocks in the
first free block. The first n-1 of these are actually free. The last one is the disk address of
another block containing addresses of another n free blocks. The importance of this
implementation is that addresses of a large number of free blocks can be found quickly.
100
Counting
Another approach is to take advantage of the fact that, generally, several contiguous
blocks may be allocated or freed simultaneously, particularly when contiguous allocation is
used. Thus, rather than keeping a list of free disk addresses, the address of the first free block
is kept and the number n of free contiguous blocks that follow the first block. Each entry in
the free-space list then consists of a disk address and a count. Although each entry requires
more space than would a simple disk address, the overall list will be shorter, as long as the
count is generally greater than 1.
14.4 Contiguous allocation
The contiguous allocation method requires each file to occupy a set of contiguous
address on the disk. Disk addresses define a linear ordering on the disk. Notice that, with this
ordering, accessing block b+1 after block b normally requires no head movement. When head
movement is needed (from the last sector of one cylinder to the first sector of the next
cylinder), it is only one track. Thus, the number of disk seeks required for accessing
contiguous allocated files in minimal, as is seek time when a seek is finally needed.
Contiguous allocation of a file is defined by the disk address and the length of the first block.
If the file is n blocks long, and starts at location b, then it occupies blocks b, b+1, b+2, …,
b+n-1. The directory entry for each file indicates the address of the starting block and the
length of the area allocated for this file.
The difficulty with contiguous allocation is finding space for a new file. If the file to be
created is n blocks long, then the OS must search for n free contiguous blocks. First-fit, bestfit, and worst-fit strategies (as discussed in Chapter 4 on multiple partition allocation) are the
most common strategies used to select a free hole from the set of available holes. Simulations
have shown that both first-fit and best-fit are better than worst-fit in terms of both time
storage utilization. Neither first-fit nor best-fit is clearly best in terms of storage utilization,
but first-fit is generally faster.
These algorithms also suffer from external fragmentation. As files are allocated and
deleted, the free disk space is broken into little pieces. External fragmentation exists when
enough total disk space exists to satisfy a request, but this space not contiguous; storage is
fragmented into a large number of small holes.
Another problem with contiguous allocation is determining how much disk space is
needed for a file. When the file is created, the total amount of space it will need must be
known and allocated. How does the creator (program or person) know the size of the file to
be created. In some cases, this determination may be fairly simple (e.g. copying an existing
file), but in general the size of an output file may be difficult to estimate.
14.5 Linked allocation
The problems in contiguous allocation can be traced directly to the requirement that the
spaces be allocated contiguously and that the files that need these spaces are of different
sizes. These requirements can be avoided by using linked allocation.
101
In linked allocation, each file is a linked list of disk blocks. The directory contains a
pointer to the first and (optionally the last) block of the file. For example, a file of 5 blocks
which starts at block 4, might continue at block 7, then block 16, block 10, and finally block
27. Each block contains a pointer to the next block and the last block contains a NIL pointer.
The value -1 may be used for NIL to differentiate it from block 0.
With linked allocation, each directory entry has a pointer to the first disk block of the
file. This pointer is initialized to nil (the end-of-list pointer value) to signify an empty file. A
write to a file removes the first free block and writes to that block. This new block is then
linked to the end of the file. To read a file, the pointers are just followed from block to block.
There is no external fragmentation with linked allocation. Any free block can be used to
satisfy a request. Notice also that there is no need to declare the size of a file when that file is
created. A file can continue to grow as long as there are free blocks. Linked allocation, does
have disadvantages, however. The major problem is that it is inefficient to support directaccess; it is effective only for sequential-access files. To find the ith block of a file, it must
start at the beginning of that file and follow the pointers until the ith block is reached. Note
that each access to a pointer requires a disk read.
Another severe problem is reliability. A bug in OS or disk hardware failure might result
in pointers being lost and damaged. The effect of which could be picking up a wrong pointer
and linking it to a free block or into another file.
14.6 Indexed allocation
The indexed allocation method is the solution to the problem of both contiguous and
linked allocation. This is done by bringing all the pointers together into one location called
the index block. Of course, the index block will occupy some space and thus could be
considered as an overhead of the method. In indexed allocation, each file has its own index
block, which is an array of disk sector of addresses. The ith entry in the index block points to
the ith sector of the file. The directory contains the address of the index block of a file. To
read the ith sector of the file, the pointer in the ith index block entry is read to find the desired
sector. Indexed allocation supports direct access, without suffering from external
fragmentation. Any free block anywhere on the disk may satisfy a request for more space.
14.7 Implementation issues
14.7.1 Memory allocation
Dynamic memory allocation involves 2 central commands: malloc allocates a portion
of unused memory for use by a process, while free frees a previously-allocated portion of
memory, allowing it to be reused. The operating system must allocate memory among all
running processes, but processes themselves must also allocate memory at a finer granularity.
Many of the issues are the same in each case.
So how can we implement these commands efficiently? First, let's consider an
extremely simple algorithm: fixed size allocation.
102
14.7.2 Fixed Sized Allocation
Assume N = 32 bytes: everything allocated is exactly 32 bytes long.
If we are given a 32 MB Heap like this:
Fig. 1: 32MB Heap
1 0 0 1 …
0 0 0
Fig 2: 1MB Free bitmap
This 32 MB Heap is divided into 32 byte chunks. In order to determine which chunks
are free we need to do a little bookkeeping. Since there 1 MB of chunks, you will need 1 MB
free bitmap to do the bookkeeping. Each byte would represent a chunk in the heap. If that
byte is 1, then the chunk is being used, if the byte is 0, then the corresponding chunk is free
for use.
With this bitmap, the algorithm for allocating a chunk would be:
Allocate
Search through bitmap
Look for free location
Turn bit to 1
Return Chunk
Keep in mind this whole sequence should be synchronized. There lies the problem of
two processes attempting to allocate the same chunk at the same time. This would cause a
very dangerous race condition.
To free a memory location the algorithm would be:
Free(chunk #i)
bitmap[i/8] &= (1 << (i % 8));
Also note that without synchronization, if two threads free two chunks in the same
byte at the same time, one chunk might not look free.
There are both positives and negatives to this design. The positive is that it uses a
single local data structure. However, this positive is more useful for disks than memory.
Memory is much faster than disk in regards to changing to different chunks. The negative is
much more glaring in this design. It is an O(n) algorithm for allocation! This is far too
inefficient to work as memory allocation. As a result, we should look for other solutions.
One proposed solution would be to use 2 pointers: one pointer at the end of the heap
and one free pointer that points to the most recently freed chunk. It would look like this:
103
Fig 3: Free and End pointer implementation
The allocation of memory would go like this:
Allocate
If free < end
Return free++;
This would have an allocation algorithm of O(1). However there is a serious problem
with this design: How would we ever free a chunk that was not the most recently allocated?
So what's a better solution? Make a free list: a linked list of free chunks. The head of
the free list points to the first free chunk, and each free chunk points to the next free chunk.
The design would be:
Allocate
If(free != NULL)
chunk = free
free = free-> next;
return chunk
Free
p->next = free
free = p
This uses the concept of using free space to maintain bookkeeping state. The more
free space we have, the more free pointers we need. Conversely, the less free space we have,
the less of a need for free pointers. As a result, this design also results in optimal use of
space. We can allocate all but one of the chunks (namely, the chunk containing the free
pointer itself). And, since all allocations return exactly one chunk, and chunks are 100%
independent of one another, the heap is nearly 100% utilizable. But a malloc() that only
works for N=32 is not much of a malloc at all! What happens when we do not have fixed
sized allocation?
Let us look at cases when N is an arbitrary number and the heap is still 32 MB. This
is called Variable Size Allocation.
14.7.3 Variable Size Allocation
In variable size allocation, unlike fixed-size allocation, we need to keep track of some
bookkeeping information for allocated chunks as well as for free chunks. In particular, the
free function must know the size of the chunk it's freeing! One common way to do this is to
104
store bookkeeping information immediately before the pointer returned by malloc. For
example:
typedef struct malloc_chunk {
int sentinel;
struct malloc_bookkeeping *prev;
struct malloc_bookkeeping *next;
char actual_data[]; /* pointer returned from malloc() points here */
} malloc_chunk_t;
Then, free can just use simple pointer arithmetic to find the bookkeeping information.
A simple heap using this bookkeeping structure might look like this.
Fig 4: Implementation of variable sized allocation.
The first component of each chunk is called the "sentinel". It shows whether the
chunk is allocated (A) or not (F). The second component has a pointer to the previous chunk,
and the third has a pointer to the next chunk. (Notice that those pointers point at the chunk
data, not at the initial sentinel!) An "X" the second or third column indicates that there is no
previous/next chunk. The fourth component is the chunk data; we write the size of the data
there. The first three columns are 4 bytes and the fourth column is the size of allocation.
The list of chunks is kept in sorted order by address; the size of any chunk can be
calculated through pointer arithmetic using the next field. As before, free chunks are
additionally kept on their own free list, using the data areas; we don't show this list in the
diagrams.
What happens when alloc(3000) is called to an empty heap?
Fig 5: The heap after a call to alloc(3000).
What happens when we call alloc(1MB)? We will just look at the latter half of the
heap from now on.
105
Fig 6: The heap after a call to alloc(1MB).
What happens when we call alloc(10MB)?
Fig 7: The heap after a call to alloc(10MB).
What happens when we call free(the 1MB chunk)?
Fig 8: The heap after a call to free(the 1MB chunk).
Now, what happens when we call alloc(22MB)?
The call will fail. Although 22MB are available, the free space is split into two
chunks, a 21MB chunk and a 1MB chunk. The chunks are noncontiguous and there's no way
to move them together. The heap is fragmented, since it is unable to allocate 22MB even if it
has 22MB of space. This type of fragmentation is called external fragmentation.
(There can also be internal fragmentation if malloc() returns chunks that are
bigger than the user requested. For example, malloc(3) might commonly return a chunk with
8 bytes of data area, rather than 3; the 5 remaining bytes are lost to internal fragmentation.)
Free space is divided into noncontiguous chunks. Allocation can fail even though there is
enough free space.
How can we solve external fragmentation? We could use compaction to solve this problem.
The heap is copied, but the free space is compacted to the right side. This allows
calls like alloc(22MB) now! But compaction is extremely expensive. It requires copying a
106
ton of data, and it also requires help from the programming language so that pointer variables
can be updated correctly. Compaction is not used on OSes for modern architectures.
Compared to memory allocation with constraints N = 32 bytes, variable size
allocation has more overheads, can run into external fragmentation, and internal
fragmentation. To avoid these issues, we turn to memory allocation with paging. This lets the
operating system use fixed-size allocation, with all its benefits, to provide applications with
variable-size allocation!
14.7.4 Memory Allocation with PAGING
0
1GB-1
Physical Addresses
4KB
0
1MB
2MB
4GB (232 )
Virtual Addresses
Fig 9: Virtual Address Space Implementation
Virtual Address Spaces:
1. Reduces fragmentation by separating contiguous virtual pages from contiguity in
physical address space.
2. Provides for isolation between processes. Each process has its own space.
%cr3
Page Directory
Page Tables
Physical Addresses
Fig 10: Paging
Paging allows the OS to avoid external fragmentation. Variable sized allocation
built from fixed size allocation + hardware supported address indirection.
Page Faults occur when you try to access an invalid address. This can happen on
execute, read, and write commands. The CPU traps on invalid accesses.
14.7.5 Memory Mapped files
When attempting a sequential read on a file on disk, we need to use system calls
such as open(), read(), and write(). This can be quite costly if the file is rather large. Also,
sharing becomes quite the pain. One alternative is memory mapped files. Memory mapping
is when a process marks a portion of its memory as corresponding to some file. For example,
suppose you wanted to read and write to /tmp/foo. This can be accomplished by memory
mapping:
107
/tmp/foo -> memory address 0x10000
So when someone accesses memory address 0x10000, they are now accessing the
start of the file foo. Because of this, the function that invokes memory mapping I/O would
need to return a pointer to the start of the file. This function is called:
void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off);
The mmap function will return a pointer to the starting address, also known as the
start of the file. In the *addr parameter, it allows the user to determine what address it would
like to begin at. It is often best to place NULL and allow the operating system to decide. If
the user were to place an address that the operating system does not like, an error will occur.
The len parameter is merely the length in which the user wants to map the file. The prot
parameter involves protection mechanism the user would like to enforce. The user can add
such flags as PROT_READ, PROT_WRITE, and PROT_EXEC as permissions for the new
memory mapped file. The parameter flags are any miscellaneous flags that the user wishes to
set. For example, to allow sharing, the user can set the MAP_SHARED flag. The parameter
fildes is the file descriptor that the opened file is on. Finally, the off parameter is the offset in
the file that you want to start mapping from. An example of an invocation of this function
would be:
addr = mmap(NULL, length, PROT_READ, 0, fd, offset);
Suppose that the we invoke the mmap function with a length of 1 MB and file
descriptor 2. It would cause the following effect:
Physical Address Space:
fd
fd
3MB
Virtual Address
fd
4MB
fd
Fig 11: Fault on 0x308002, loads from physical address space.
What occurs now is that the files are read in the background into physical memory space.
When the process faults:
OS checks if the addr is in mmapped space
If it is, then it will load that page of data from disk (unless already cached)
Add virtual memory mapping for that page
Some advantages/positives to memory mapping include no need for copying when
not needed, and the ability to share a file amongst processes. If processes read the same file,
they can share the same physical address. However, there are also some disadvantages to
memory mapped I/O. A simple problem is that data must be page-aligned; user programs
must be careful to ensure this. A more complex problem is the need to synchronize writes
with the disk. Once a file is memory mapped, will the OS have to write out the entire file
back to disk, even if only one byte was written? Another complex problem is unwanted
sharing. Say a process P1 is reading a file, and another process P2 is writing to the same file.
108
Before memory-mapped I/O, P1 could get a stable version of the file by reading its data into
memory; no one could change P1's memory copy of the file data. But with memory-mapped
I/O, it's possible that P1 would see all of P2's changes to the file, as P2 made those changes!
14.7.6 Hardware support
After talks between software and hardware manufacturers, the hardware manufacturers were
able to improve the hardware to help support memory management. The result was a Page
Table Entry like the following below:
20 bits
12 bits of flags
Physical Address
A
U
D
Dirty bit
Access Bit
User/ Supervisor
W
P
Present
Write/Read Permission
Fig 12: Page Entry Table
The new accessed bit and dirty bit are set by the hardware. The CPU sets the
access bit to 1 when a portion of memory has been read or written, and sets the dirty bit to 1
when the portion of memory has been written to. The hardware never clears these bits; the
operating system is expected to clear them as necessary.
This allows for more efficient writes to mmapped files. The operating system only
needs to write pages of mmapped files when they have the dirty bit set to 1. All other pages
have not been altered and consequently do not need to be written back to disk. This reduces
bus traffic as well as overhead. Once a read has occurred, the file is changed and the OS
clears the dirty bits on the pages of the mmapped file.
3MB
VA1
PA
4MB
x
fd
fd
fd
Fig 13: Writing to a mmapped file.
Trick: When synchronizing files with the disk, the operating system clears the dirty bit on all
pages of the memory mapped file when they are first read in.
Note: Only write pages with D = 1! (Dirty bit set to 1). This reduces disk traffic for writes.
14.7.7 Copy-on-Write
109
VA1
VA2
fd
PA
fd
fd
fd
Fig 14: Copy-on-Write
To fix unwanted sharing, a copy-on-write algorithm can be applied. This algorithm
allows sharing of pages between two processes until one process decides to write to it. When
one process tries to write on a page, the page is copied so that each process has its own
version of the page, as depicted above.
To copy-on-write, the operating system marks the pages in bookkeeping structure as
copy-on-write and the virtual address mapping is set to NON-WRITABLE. When a process
tries to write to the page, it faults!
When the operating system catches a fault, it copies the page, and changes the
virtual address mapping to the new copied page. It sets the virtual address mapping to
WRITABLE. After this event, each process has its own copy of the page and is able to write
to its own copies.
14.8 Let us sum up
In this lesson we have learnt about
a) the memory allocation
b) and the allocation and freeing space
14.9 Points for discussion
a) Discuss about linked allocation
b) Discuss about indexed allocation
14.10 Model answer to Check your Progress
In order to check the progress of the student, answer for second question in section 14.10
is given here.
The indexed allocation method is the solution to the problem of both contiguous and
linked allocation. This is done by bringing all the pointers together into one location called
the index block. Of course, the index block will occupy some space and thus could be
considered as an overhead of the method. In indexed allocation, each file has its own index
block, which is an array of disk sector of addresses. The ith entry in the index block points to
the ith sector of the file. The directory contains the address of the index block of a file. To
read the ith sector of the file, the pointer in the ith index block entry is read to find the desired
sector. Indexed allocation supports direct access, without suffering from external
fragmentation. Any free block anywhere on the disk may satisfy a request for more space.
110
14.11 Lesson - end activities
After learning this lesson, try to discuss among your friends and answer these questions to
check your progress.
b) Discuss about Memory allocation
c) Discuss about free space management
14.12 References
 Charles Crowley, Chapter 16, 17 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
 H.M. Deitel, Chapter 13 of “Operating Systems”, Second Edition, Pearson
Education, 2001
 Andrew S. Tanenbaum, Chapter 4 of “Modern Operating Systems”, PHI, 1996
 D.M. Dhamdhere, Chapter 17 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
111
LESSON – 15: FILE DESCRIPTORS AND ACCESS CONTROL
CONTENTS
15.1 Aims and Objectives
15.2 Introduction
15.3 File descriptor in programming
15.4 Operations on file descriptors
15.4.1
Creating file descriptors
15.4.2
Deriving file descriptors
15.4.3
Operations on a single file descriptor
15.4.4
Operations on multiple file descriptors
15.4.5
Operations on the file descriptor table
15.4.6
Operations that modify process state
15.4.7
File locking
15.4.8
Sockets
15.4.9
Miscellaneous
15.4.10
Upcoming Operations
15.5
Access Control Matrix
15.6
Let us sum up
15.7
Points for Discussion
15.8
Model Answers to Check your Progress
15.9
Lesson - end activities
15.10 References
15.1 Aims and Objectives
In this lesson we will learn about the file descriptors and access control.
The objectives of this lesson is to make the candidate aware of the following
a) file descriptors
b) operations on file descriptor
a. creating
b. deriving
c. modifying, etc.
c) access control matrix
15.2 Introduction
112
A file descriptor or file control block is a control block containing information the
system needs to manage a file. The file descriptor is controlled by the operating system and
is brought to the primary storage when a file is opened. A file descriptor contains
information regarding (i) symbolic file name, (ii) location of file, (iii) file organization
(sequential, indexed, etc.), (iv) device type, (v) access control data, (vi) type (data file, object
program, C source program, etc.), (vii) disposition (temporary or permanent), (viii) date and
time of creation, (ix) destroy date, (x) last modified date and time, (xi) access activity counts
(number of reads, etc.).
15.3 File descriptor in programming
In computer programming, a file descriptor is an abstract key for accessing a file.
The term is generally used in POSIX operating systems. In Microsoft Windows terminology
and in the context of the C standard I/O library, "file handle" is preferred, though the latter
case is technically a different object (see below).
In POSIX, a file descriptor is an integer, specifically of the C type int. There are 3
standard POSIX file descriptors which presumably every process (save perhaps a daemon)
should expect to have:
Integer value
Name
0
Standard Input (stdin)
1
Standard Output (stdout)
2
Standard Error (stderr)
Generally, a file descriptor is an index for an entry in a kernel-resident data structure
containing the details of all open files. In POSIX this data structure is called a file descriptor
table, and each process has its own file descriptor table. The user application passes the
abstract key to the kernel through a system call, and the kernel will access the file on behalf
of the application, based on the key. The application itself cannot read or write the file
descriptor table directly.
In Unix-like systems, file descriptors can refer to files, directories, block or character
devices (also called "special files"), sockets, FIFOs (also called named pipes), or unnamed
pipes.
The FILE * file handle in the C standard I/O library routines is technically a pointer to
a data structure managed by those library routines; one of those structures usually includes an
actual low level file descriptor for the object in question on Unix-like systems. Since file
handle refers to this additional layer, it is not interchangeable with file descriptor.
To further complicate terminology, Microsoft Windows also uses the term file handle
to refer to the more low-level construct, akin to POSIX's file descriptors. Microsoft's C
libraries also provide compatibility functions which "wrap" these native handles to support
the POSIX-like convention of integer file descriptors as detailed above.
A program is passed a set of ``open file descriptors'', that is, pre-opened files. A
setuid/setgid program must deal with the fact that the user gets to select what files are open
113
and to what (within their permission limits). A setuid/setgid program must not assume that
opening a new file will always open into a fixed file descriptor id, or that the open will
succeed at all. It must also not assume that standard input (stdin), standard output (stdout),
and standard error (stderr) refer to a terminal or are even open.
The rationale behind this is easy; since an attacker can open or close a file descriptor
before starting the program, the attacker could create an unexpected situation. If the attacker
closes the standard output, when the program opens the next file it will be opened as though it
were standard output, and then it will send all standard output to that file as well. Some C
libraries will automatically open stdin, stdout, and stderr if they aren't already open (to
/dev/null), but this isn't true on all Unix-like systems. Also, these libraries can't be completely
depended on; for example, on some systems it's possible to create a race condition that causes
this automatic opening to fail (and still run the program).
When using ILE C/400 stream I/O functions as defined by the American National
Standards Institute (ANSI) to perform operations on a file, you identify the file through use of
pointers. When using the integrated file system C functions, you identify the file by
specifying a file descriptor. A file descriptor is a positive integer that must be unique in each
job. The job uses a file descriptor to identify an open file when performing operations on the
file. The file descriptor is represented by the variable fildes in C functions that operate on the
integrated file system and by the variable descriptor in C functions that operate on sockets.
Each file descriptor refers to an open file description, which contains information
such as a file offset, status of the file, and access modes for the file. The same open file
description can be referred to by more than one file descriptor, but a file descriptor can refer
to only one open file description.
Figure 1. File descriptor and open file description
If an ILE C/400 stream I/O function is used with the integrated file system, the ILE
C/400 run-time support converts the file pointer to a file descriptor.
When using the "root" (/), QOpenSys, or user-defined file systems, you can pass
access to an open file description from one job to another, thus allowing the job to access the
file. You do this by using the givedescriptor() or takedescriptor() function to pass the file
descriptor between jobs.
114
15.4 Operations on file descriptors
A modern Unix typically provides the following operations on file descriptors.
15.4.1 Creating file descriptors
open(), open64(), creat(), creat64()
socket()
socketpair()
pipe()
15.4.2 Deriving file descriptors
fileno()
dirfd()
15.4.3 Operations on a single file descriptor
read(), write()
recv(), send()
recvmsg(), sendmsg() (inc. allowing sending FDs)
sendfile()
lseek(), lseek64()
fstat(), fstat64()
fchmod()
fchown()
fdopen()
gzdopen()
ftruncate()
15.4.4 Operations on multiple file descriptors
select(), pselect()
poll(), epoll()
15.4.5 Operations on the file descriptor table
close()
dup()
dup2()
fcntl (F_DUPFD)
fcntl (F_GETFD and F_SETFD)
15.4.6 Operations that modify process state
fchdir(): sets the process's current working directory based on a directory file
descriptor
mmap(): maps ranges of a file into the process's address space
115
15.4.7 File locking
flock()
fcntl (F_GETLK, F_SETLK and F_SETLKW)
lockf()
15.4.8 Sockets
connect()
bind()
listen()
accept(): creates a new file descriptor for an incoming connection
getsockname()
getpeername()
getsockopt(), setsockopt()
shutdown(): shuts down one or both halves of a full duplex connection
116
15.4.9 Miscellaneous
ioctl(): a large collection of miscellaneous operations on a single file descriptor, often
associated with a device
15.4.10 Upcoming Operations
A series of new operations on file descriptors has been added to Solaris and Linux, as
well as numerous C libraries, to be standardized in a future version of POSIX. The at suffix
signifies that the function takes an additional first argument supplying a file descriptor from
which relative paths are resolved, the forms lacking the at suffix thus becoming equivalent to
passing a file descriptor corresponding to the current working directory.
openat()
faccessat()
fchmodat()
fchownat()
fstatat()
futimesat()
linkat()
mkdirat()
mknodat()
readlinkat()
renameat()
symlinkat()
unlinkat()
mkfifoat()
15.5 Access Control Matrix
Access Control Matrix or Access Matrix is an abstract, formal computer protection
and security model used in computer systems, that characterizes the rights of each subject
with respect to every object in the system. It was first introduced by Butler W. Lampson in
1971. It is the most general description of operating system protection mechanism.
According to the model a computer system consists of a set of objects O, that is the
set of entities that needs to be protected (e.g. processes, files, memory pages) and a set of
subjects S, that consists of all active entities (e.g. users, processes). Further there exists a set
of rights R of the form r(s,o). A right thereby specifies the kind of access a subject is allowed
to process with regard to an object.
117
Example
In this matrix example there exists two processes, a file and a device. The first process
has the ability to execute the second, read the file and write some information to the device,
while the second process can only send information to the first.
Asset 1
Asset 2
Role 1 read, write, execute, own execute
Role 2 Read
file
device
read write
read, write, execute, own
Utility
Because it does not define the granularity of protection mechanisms, the Access
Control Matrix can be used as a model of the static access permissions in any type of access
control system. It does not model the rules by which permissions can change in any particular
system, and therefore only gives an incomplete description of the system's access control
security policy.
An Access Control Matrix should be thought of only as an abstract model of
permissions at a given point in time; a literal implementation of it as a two-dimensional array
would have excessive memory requirements. Capability-based security and access control
lists are categories of concrete access control mechanisms whose static permissions can be
modeled using Access Control Matrices. Although these two mechanisms have sometimes
been presented (for example in Butler Lampson's Protection paper) as simply row-based and
column-based implementations of the Access Control Matrix, this view has been criticized as
drawing a misleading equivalence between systems that does not take into account dynamic
behaviour.
15.6 Let us sum up
In this lesson we have learnt about
a) the file descriptors
b) and the access control matrix
15.7 Points for Discussion
a) Define file descriptor
b) Explain about the implementation issues of file descriptor
118
15.8 Model Answers to Check your Progress
In order to check the progress of the candidate the answer for the first question in the section
15.8 is given below.
A file descriptor or file control block is a control block containing information the
system needs to manage a file. The file descriptor is controlled by the operating system and
is brought to the primary storage when a file is opened. A file descriptor contains
information regarding (i) symbolic file name, (ii) location of file, (iii) file organization
(sequential, indexed, etc.), (iv) device type, (v) access control data, (vi) type (data file, object
program, C source program, etc.), (vii) disposition (temporary or permanent), (viii) date and
time of creation, (ix) destroy date, (x) last modified date and time, (xi) access activity counts
(number of reads, etc.).
15.9 Lesson - end activities
After learning this lesson, try to discuss among your friends and answer these questions to
check your progress.
b) Discuss about file operations
c) Discuss about access control matrix
15.10 References
a) Charles Crowley, Chapter 16, 17 of “Operating Systems – A Design-Oriented
Approach”, Tata McGraw-Hill, 2001
b) H.M. Deitel, Chapter 13 of “Operating Systems”, Second Edition, Pearson
Education, 2001
c) Andrew S. Tanenbaum, Chapter 4 of “Modern Operating Systems”, PHI, 1996
d) D.M. Dhamdhere, Chapter 17 of “Systems Programming and Operating Systems”,
Tata McGraw-Hill, 1997
119
UNIT – V
LESSON – 16: MS-DOS
CONTENTS
16.1 Aims and Objectives
16.2 Introduction
16.3 Accessing hardware under DOS
16.4 Early History of MS-DOS
16.5 User’s view of MS-DOS
16.5.1
16.5.2
16.5.3
16.5.4
16.5.5
16.5.6
16.5.7
16.5.8
16.5.9
16.5.10
File Management
Change Directory
Make Directory
Copy
Delete and Undelete
Rename
Parent directory
F3
Breaking
Wildcards (*) and (?)
16.6 Executing, Viewing, Editing, Printing
16.7 Backup Files
16.8 Other commands
16.8.1
16.8.2
16.8.3
16.8.4
16.8.5
16.8.6
16.8.7
16.8.8
16.8.9
Change the Default Drive
Change Directory Command
DIR (Directory) Command
ERASE Command
FORMAT Command
Rebooting the computer (Ctrl-Alt-Del)
RENAME (REN) Command
RMDIR (RD) Remove Directory Command
Stop Execution (Ctrl-Break)
16.9 The system view of MS-DOS
16.10 The future of MS-DOS
16.11 Let us sum up
16.12 Points for Discussion
16.13 Model Answers to Check your Progress
16.14 Lesson - end activities
16.15 References
120
16.1 Aims and Objectives
In this lesson we will learn about the introduction to MS-DOS and its commands.
The objectives of this lesson is to make the candidate aware of the following
d) History of MS-DOS
e) MS-DOS Commands
f) Users view of MS-DOS
g) System View of MS-DOS
h) And the future of MS-DOS
16.2 Introduction
A full time operating system was needed for IBM's 8086 line of computers, but
negotiations for the use of CP/M on these broke down. IBM approached Microsoft's CEO,
Bill Gates, who purchased QDOS from SCP allegedly for $50,000. This became Microsoft
Disk Operating System, MS-DOS. Microsoft also licensed their system to multiple computer
companies, who used their own names. Eventually, Microsoft would require the use of the
MS -DOS name, with the exception of the IBM variant, which would continue to be
developed concurrently and sold as PC-DOS (this was for IBM's new 'PC' using the 8088
CPU (internally the same as the 8086)).
Early versions of Microsoft Windows were little more than a graphical shell for DOS,
and later versions of Windows were tightly integrated with MS-DOS. It is also possible to run
DOS programs under OS/2 and Linux using virtual-machine emulators. Because of the long
existence and ubiquity of DOS in the world of the PC-compatible platform, DOS was often
considered to be the native operating system of the PC compatible platform.
There are alternative versions of DOS, such as FreeDOS and OpenDOS. FreeDOS
appeared in 1994 due to Microsoft Windows 95, which differed from Windows 3.11 by being
not a shell and dispensing with MS-DOS.
MS-DOS (and the IBM PC-DOS which was licensed therefrom), and its predecessor,
86-DOS, was inspired by CP/M (Control Program / (for) Microcomputers) - which was the
dominant disk operating system for 8-bit Intel 8080 and Zilog Z80 based microcomputers.
Tim Paterson at Seattle Computer Products developed a variant of CP/M-80, intended as an
internal product for testing the SCP's new 8086 CPU card for the S-100 bus. It did not run on
the 8080 CPU needed for CP/M-80. The system was named 86-DOS (it had initially been
called QDOS, which stood for Quick and Dirty Operating System).
Digital Research would attempt to regain the market with DR-DOS, an MS-DOS and
CP/M hybrid. Digital Research would later be bought out by Novell, and DR DOS became
Novell DOS 7. DR DOS would later be part of Caldera (as OpenDOS), Lineo (as DR DOS),
and DeviceLogics.
Early versions of Microsoft Windows were shell programs that ran in DOS. Windows
3.11 extended the shell by going into protected mode and added 32-bit support. These were
16-bit/32-bit hybrids. Microsoft Windows 95 further reduced DOS to the role of the boot
loader. Windows 98 and Windows Me were the last Microsoft OS to run on DOS. The DOS121
based branch was eventually abandoned in favor of Windows NT, the first true 32-bit system
that was the foundation for Windows XP and Windows Vista.
Windows NT, initially NT OS/2 3.0, was the result of collaboration between
Microsoft and IBM to develop a 32-bit operating system that had high hardware and software
portability. Because of the success of Windows 3.0, Microsoft changed the application
programming interface to the extended Windows API, which caused a split between the two
companies and a branch in the operating system. IBM would continue to work on OS/2 and
OS/2 API, while Microsoft renamed its operating system Windows NT.
16.3 Accessing hardware under DOS
The operating system offers a hardware abstraction layer that allows development of
character-based applications, but not for accessing most of the hardware, such as graphics
cards, printers, or mice. This required programmers to access the hardware directly, resulting
in each application having its own set of device drivers for each hardware peripheral.
Hardware manufacturers would release specifications to ensure device drivers for popular
applications were available.
16.4 Early History of MS-DOS
Microsoft bought non-exclusive rights for marketing 86-DOS in October 1980. In
July 1981, Microsoft bought exclusive rights for 86-DOS (by now up to version 1.14) and
renamed the operating system MS-DOS.
The first IBM branded version, PC-DOS 1.0, was released in August, 1981. It
supported up to 640 kB of RAM[3] and four 160 kB 5.25" single sided floppy disks.
Various versions of DOS:
Version - 1.1
In May 1982, PC-DOS 1.1 added support for 320 kB double-sided floppy disks.
Version – 2.0
PC-DOS 2.0 and MS-DOS 2.0, released in March 1983, were the first versions to
support the PC/XT and fixed disk drives (commonly referred to as hard disk drives). Floppy
disk capacity was increased to 180 kB (single sided) and 360 kB (double sided) by using nine
sectors per track instead of eight.
At the same time, Microsoft announced its intention to create a GUI for DOS. Its first
version, Windows 1.0, was announced on November 1983, but was unfinished and did not
interest IBM. By November 1985, the first finished version, Microsoft Windows 1.01, was
released.
Version – 3.0
MS-DOS 3.0, released in September 1984, first supported 1.2Mb floppy disks and
32Mb hard disks.
122
Version - 3.1
MS-DOS 3.1 released November that year, introduced network support.
Version - 3.2
MS-DOS 3.2, released in April 1986, was the first retail release of MS-DOS. It added
support of 720 kB 3.5" floppy disks. Previous versions had been sold only to computer
manufacturers who pre-loaded them on their computers, because operating systems were
considered part of a computer, not an independent product.
Version - 3.3
MS-DOS 3.3, released in April 1987, featured logical disks. A physical disk could be
divided into several partitions, considered as independent disks by the operating system.
Support was also added for 1.44 MB 3.5" floppy disks.
The first version of DR DOS was released in May of 1988, and was compatible with
MS/PC-DOS 3.3. Later versions of DR DOS would continue to identify as "DOS 3.31." to
applications, despite using newer version numbers.
Version - 4.0
MS-DOS 4.0, released in July 1988, supported disks up to 2 GB (disk sizes were
typically 40-60 MB in 1988), and added a full-screen shell called DOSSHELL. Other shells,
like Norton Commander and PCShell, already existed in the market. In November of 1988,
Microsoft addressed many bugs in a service release, MS-DOS 4.01.
DR DOS skipped version 4 due to perceived unpopularity of MS-DOS 4.x. Wishing
to get a jump on Microsoft, Digital Research released DR DOS 5 in May 1990, which
included much more powerful utilities that previous DOS versions.
Version - 5.0
MS-DOS 5.0 was released in April 1991, mainly as a follow-up to DR DOS 5. It
included the full-screen BASIC interpreter QBasic, which also provided a full-screen text
editor (previously, MS-DOS had only a line-based text editor, edlin). A disk cache utility
SmartDrive, undelete capabilities, and other improvements were also included. It had severe
problems with some disk utilities, fixed later in MS-DOS 5.01, released later in the same
year.
Version - 6.0
MS-DOS 6.0 was released in March 1993. Following competition from Digital
Research's SuperStor, Microsoft added a disk compression utility called DoubleSpace. At the
time, typical hard disk sizes were about 200-400 MB, and many users badly needed more
disk space. It turned out that DoubleSpace contained stolen code from another compression
utility, Stacker, which led to later legal problems. MS-DOS 6.0 also featured the disk
123
defragmenter DEFRAG, backup program MSBACKUP, memory optimization with
MEMMAKER, and rudimentary virus protection via MSAV.
Version - 6.2
As with versions 4.0 and 5.0, MS-DOS 6.0 turned out to be buggy. Due to complaints
about loss of data, Microsoft released an updated version, MS-DOS 6.2, with an improved
DoubleSpace utility, a new disk check utility, SCANDISK (similar to fsck from Unix), and
other improvements.
December 1993 saw the release of Novell DOS 7, which was DR DOS under a new
name. Its multiple bugs, as well as DR DOS' already declining market share and Windows 95
looming on the horizon, led to low sales. By this time, PC DOS was at version 6.1, and IBM
split its development from Microsoft. From this point, the two developed independently.
Version - 6.21
The next version of MS-DOS, 6.21 (released March 1994), appeared due to legal
problems. Stac Electronics sued Microsoft due to stolen source code from their utility,
Stacker, and forced them to remove DoubleSpace from their operating system.
Version - 6.22
In May 1994, Microsoft released MS-DOS 6.22, with another disk compression
package, DriveSpace, licensed from VertiSoft Systems. MS-DOS 6.22 was the last standalone version of MS-DOS available to the general public. MS-DOS was removed from
marketing by Microsoft on November 30, 2001.
Version - 6.23
Microsoft also released versions 6.23 to 6.25 for banks and American military
organizations. These versions introduced FAT32 support.
Version - 7.0
Microsoft Windows 95 incorporated MS-DOS version 7.0, but only as the kernel (as
Windows became the full operating system). Windows 98 also used MS-DOS 7. At this
point, Microsoft announced abandonment of the DOS kernel and released Windows 2000 on
the NT kernel, but following its commercial failure, released one more DOS kernel Windows,
Windows ME. The next system, Windows XP, was based on the NT kernel. Windows ME
used MS-DOS 8; Windows XP and Vista continue to use MS-DOS 8 on emergency startup
disks.
IBM released PC-DOS 7.0 in early 1995. It incorporated many new utilities such as
anti-virus software, comprehensive backup programs, PCMCIA support, and DOS Pen
extensions. Also added were new features to enhance available memory and disk space. The
last version of PC DOS was PC DOS 2000, released in 1998. Its major feature was Y2K
compatibility.
124
16.5 User’s view of MS-DOS
The user’s view of MS-DOS deals with ‘what the user can accomplish with the MSDOS commands. This section deals with some of the important MS-DOS commands and
other commands can be obtained from MS-DOS manual.
In MS-DOS, the Command processor interprets user commands. The Command
processor is stored in the file COMMAND.COM, which is loaded automatically when MSDOS is started. Internal commands are part of this file. External commands are brought into
memory from disk as and when needed. The command processor also executes program
files. Executable files have one of the extensions .COM, .EXE, and .BAT. MS-DOS
prompts the user to enter commands. The standard prompt is the letter of the current drive
followed by the greater than sign.
16.5.1 File Management
Before we get started with perhaps the most fundamental aspect of the operating
system, file management, let's make sure we are at the root directory, a sort of home base.
Many commands in the DOS environment follow a certain format. In order to tell
DOS to perform a function, at the command prompt, you would type the command followed
by arguments which specify what you want DOS to do. For example:
C:\>copy practice.txt a:
"COPY" is the command that you want DOS to perform.
"PRACTICE.TXT A: " is an example of an argument which specifies what will be
affected by the command.
In this case, DOS will copy the file practice.txt from the C: drive to the A: drive.
Commands such as edit, del, rename, and many other commands require arguments similar to
the example listed above.
16.5.2 Change Directory
To move to a different directory, use the command "cd" (for "change directory")
followed by the directory you wish to move to. The backslash by itself always represents the
root directory.
C:\>cd \
Now let's create a file. At this point it is important to mention that DOS requires that
filenames be no longer than eight characters with the option of a three character extension for
descriptive purposes. Also, no spaces or slashes are acceptable. Again, some of this harkens
back to the limitations of earlier days. Since it is going to be a simple text file, let's call our
file "practice.txt".


C:\> edit practice.txt
Type your name or some other message.
125



Then press the Alt key, followed by the f key to display the File menu.(You can also
use the mouse.) Press [ALT] + [F]
Then press the s key to save the file. Press [S]
To exit, press the Alt key, followed by the f key followed by the x key. Press [ALT]
+ [F] + [X]
If we take a look at the root directory, the PRACTICE.TXT file will be included. DOS sees
the new file as C:\PRACTICE.TXT.
16.5.3 Make Directory
Now, let's make a couple of directories so we can put our file away in a place that
makes sense to us. The command for making a directory is "md" (for "make directory") and
is followed by the name of the directory you wish to make.





C:\> md dosclass
(to make the directory DOSCLASS)
C:\> dir
(to view the directory with the new DOSCLASS subdirectory)
C:\> cd dosclass
(to change to the DOSCLASS directory)
C:\DOSCLASS> md samples
(to make the directory "samples" inside the DOSCLASS directory)
C:\DOSCLASS> dir
(to view the directory with the new SAMPLES subdirectory)
Note that as soon as we changed our directory, (cd dosclass) the prompt changed to
represent the new directory. Remember, if you want to get your bearings, you can take a look
at the command prompt or display a directory list of the current directory (dir).
16..5.4 Copy
Now that we have created the directories, we can put the practice file we created into
the new directory, SAMPLES (C:\DOSCLASS\SAMPLES). To keep things simple, let's "cd"
back to the root directory where the practice file is.
C:\>cd \
And now let's copy that file to the SAMPLES directory which is inside the
DOSCLASS directory. In order to copy something, you must first issue the command (copy),
then identify the file to be copied (source), and then the directory to which you wish to copy
the file (destination).
C:\>copy practice.txt dosclass\samples
A somewhat unfriendly yet useful diagram of the command format would look
something like this (where things in brackets are optional).
copy [volume+pathname+]filename [volume+pathname+]directory
126
What this means is that you don't have to include the volume and pathname of the
source file and the destination directory (we didn't in the first example). This is because DOS
assumes that any filename or directory included in the copy command is in the current
directory. Because we had moved to the root directory, both PRACTICE.TXT and
DOSCLASS were in the current directory.
But the nice thing about command-line interfaces is that you don't have to be in the
directory of the files you wish to act on. The command-line interface makes it possible for the
user to use one command to copy any file anywhere to any other location. From any directory
(within volume C:), we could have used the following command to copy the same file to the
same directory:
C:\DOSCLASS>copy \practice.txt \dosclass\samples
This command would perform the same function as the first command. We just told
the computer where to find the source file since it wasn't in the current directory by placing a
backslash in front of the filenames (which told the computer that the file was in the root
directory). This applies to copies between volumes as well. All you have to do is specify the
volume:
Z:\ANYWHERE>copy c:\practice.txt c:\dosclass\samples
or, a very common and useful example:
C:\>copy practice.txt a:\backup\dosclass
This command copied the file to a floppy disk in the PC's floppy drive (which already had the
directories, BACKUP\DOSCLASS).
16.5.5 Delete and Undelete
There is a slight problem now. The PRACTICE.TXT file was copied into the
SAMPLES directory, but there is no reason to have two copies of PRACTICE.TXT. To
delete a file, use the DEL command (from "delete") followed by the name of the file.
C:\>del practice.txt
Again, you can delete the file from any directory by including the full pathname.
Z:\>del c:\practice.txt
If you accidentally delete something, there is a possibility of retrieving it using the
"undelete" command. This, however, will only work for files just deleted.
C:\undelete
A list of any files that can be undeleted will appear. You will need to replace the first
character of the file because DOS removes the first letter of a deleted file (that's how it keeps
track of what can be written over).
16.5.6 Rename
127
Like the above commands, you needn't be in the same directory as the file you wish to
rename provided you include the pathname of the file you wish to change. But the second
argument of this command, the new filename, will not accept a pathname designation. In
other words, the second argument should just be a filename (with no pathname):
C:\>ren \dosclass\samples\practice.txt prac.txt
Note: Being able to designate a new path would in effect allow the user to move a file
from one place to another without having to copy and delete. A common complaint about
DOS is that there is no "move" command. Therefore, the only way to move files from one
location to another is first to copy them and then to delete the originals.
16.5.7 Parent directory
If you wish to move up one level in the hierarchy of the file structure (change to the
parent directory), there is a shortcut--two consecutive periods: ".."
C:\DOSCLASS\SAMPLES>cd ..
This works in a pathname as well:
C:\DOSCLASS\SAMPLES>ren ..\practice.txt prac.txt
If the file PRACTICE.TXT were in the directory above SAMPLES (the DOSCLASS
directory), the above command would change it to PRAC.TXT.
16.5.8 F3
The F3 function key can be a time saver if you're prone to typos. Pressing the F3 button will
retype the last command for you.
16.5.9 Breaking
Sometimes, you may start a procedure such as a long directory listing and wish to stop
before it is completed. The Break command often works when running batch files and other
executable files. To stop a procedure, press [CTRL] + [BREAK]. [CTRL] is located in the
lower right-hand corner of the keyboard. [BREAK] is located in the upper right hand corner
of the keyboard and is also labeled [PAUSE].
16.5.10 Wildcards (*) and (?)
Another benefit of the command-prompt interface is the ability to use wildcards. If
you want, for example, to copy only files with the .txt extension, you employ a wildcard:
C:\>copy *.txt a:\backup\txtfiles
This command would copy all of the files with the .txt extension onto a floppy disk
inside the TXTFILES directory which is inside the BACKUP directory. To copy all files
from the C drive to the A drive, you would type:
128
C:\>copy *.* a:
The wildcard is often used when retrieving a directory of similarly named files such as:
C:\>dir *.txt
This command would display all files ending with the .txt extension. To list all files that
begin with the letter g, you would type the following:
C:\>dir g*.*
Additionally the ? can be used to substitute for individual letters. If there are many similarly
named files that end with the letters GOP but begin with letters A through G, then you can
type the following to list those files:
C:\>dir ?gop.*
The following command would list all similar files beginning with the letters REP and ending
with A.
C:\>dir rep?a.*
The ? wildcard can be used to replace any letter in any part of the filename. Wildcards such
as * and ? can be useful when you do not know the full name of a file or files and wish to list
them separately from the main directory listing.
16.6 Executing, Viewing, Editing, Printing
Executing
Binary files ending in .exe are usually "executed" by typing the filename as if it were
a command. The following command would execute the WordPerfect application which
appears on the disk directory as WP.EXE:
C:\APPS\WP51>wp
Binary files ending in .com often contain one or more commands for execution either
through the command prompt or through some program.
Viewing
Text files, on the other hand can be viewed quickly with the type command:
C:\>cd dosclass\samples
C:\DOSCLASS\SAMPLES>type practice.txt | more
Editing
129
Or you can view the file in the provided text editor (just as we did when we first created
practice.txt):
C:\DOSCLASS\SAMPLES>edit practice.txt
Printing
If you want to send a text file to the printer, there is the print command. But there are
two steps:
C:\DOSCLASSES\SAMPLES>print practice.txt
Name of list device [PRN]: lpt2
If you wish to print to a networked printer, usually lpt2 is the correct response. For
local printers, the correct response is usually lpt1.
16.7 Backup Files
It is possible to lose files by mistake, although the more you practice the less likely it
becomes. For your own peace of mind, it is good practice to make backup copies of your
most valuable files on a separate diskette. Store your backup disk in a safe place and don't
carry it through a metal detector. Use the COPY command to create the backup.
There is no need to backup every file you create, only the ones in which you've
invested much work. Also, prune your backup diskette every week or two using the ERASE
command. Backup files which have been made redundant by subsequent additions will
simply create clutter on your backup diskette. An effective file naming convention is essential
to keeping track of your backups.
16.8 Other commands
16.8.1 Change the Default Drive
To change the default drive, simply type the letter of the your choice. The new default
will be listed in subsequent DOS prompts.
Example:
 C> A: [enter]
 Changes the default drive from C to A.
 A> C: [enter]
 Changes the default drive from A to C.
[enter] means that you must press the Enter Key before the format command will execute.
[Enter] is required after any DOS command, it is assumed in all commands found below.
16.8.2 Change Directory Command
Once you have located the directory you want, you may move from directory to
directory using the CD command (change directory)
130
Example:
 C> cd furniture
Moves you to the directory called 'FURNITURE'
 C> cd \furniture\chairs
Moves you to the directory called 'CHAIRS' under the directory called
'FURNITURE'.
 C> cd ..
Moves you up one level in the path.
 C> cd \
Takes you back to the root directory (c: in this case).
16.8.3 DIR (Directory) Command
The DIRECTORY command lists the names and sizes of all files located on a
particular disk.
Example:
 C> dir a:
Shows directory of drive A
 C> dir b:
Shows directory of drive B
 C> dir \agis
Shows files in a subdirectory on drive C (default)
 C> dir
Shows directory of drive C
 C> dir /w
Shows directory in wide format, as opposed to a vertical listing.
All the files are listed at the screen, you can stop the display by typing CTRL-BREAK. If
you ask for a directory on the A or B drives, be sure there is a diskette in the drive and that
the diskette has been formatted. If the drive is empty, or if the diskette is unformatted, the
DOS will respond with an error message.
16.8.4 ERASE Command
The ERASE command deletes specified files.
Example:
C> erase a:myfile.txt
Erases the file MYFILE.TXT from the diskette in the A drive. If no drive specification is
entered, the system looks to delete the specified file form drive C (in this case).
IMPORTANT WARNING: This command is easy to use, but it is the most dangerous
one you will encounter in DOS (apart form FORMAT). If you aren't careful, you may delete
a file which you--or someone else--needs. And, unless you have saved a backup of that file,
the erased file is gone for good. For this reason it is good practice to use only complete file
specifications with the ERASE command (and to keep backups of your most valuable files).
As a safety precaution, never use the wild-card characters '*' and '?' in ERASE commands.
131
16.8.5 FORMAT Command
You must format new disks before using them on the IBM computers. The format
command checks a diskette for flaws and creates a directory where all the names of the
diskette's files will be stored.
Example:
C> format a:
Formats the diskette in the A drive.
After entering this command, follow the instructions on the screen. When the FORMAT
operation is complete, the system will ask if you wish to FORMAT more diskettes. If you are
working with only one diskette, answer N (No) and carry on with you work. If you wish to
FORMAT several diskettes, answer Y (Yes) until you have finished formatting all your
diskettes.
BEWARE: Executing the format command with a diskette which already contains files
will result in the deletion of all the contents of the entire disk. It is best to execute the format
command only on new diskettes. If you format an old diskette make sure it contains nothing
you wish to save.
16.8.6 Rebooting the computer (Ctrl-Alt-Del)
In some cases, when all attempts to recover from a barrage of error messages fails, as
a last resort you can reboot the computer. To do this, you press, all at once, the control,
alternate and delete.
BEWARE: If you re-boot, you may loose some of your work--any data active in
RAM which has not yet been saved to disk.
16.8.7 RENAME (REN) Command
The RENAME command permits users to change the name of a file without making a
copy of it.
Example:
C> ren a:goofy.txt pluto.txt
Changes the name of 'GOOFY.TXT' on the A drive to 'PLUTO.TXT'.
This command is very simple to use, just remember two points: the file name and
extension must be complete for the source file and no drive specification is given for the
target. Renaming can only occur on a single disk drive (otherwise COPY must be used).
16.8.8 RMDIR (RD) Remove Directory Command
This command removes a directory. It is only possible to execute this command if the
directory you wish to remove is empty.
Example:
132
C> rd mine
Removes directory called 'MINE'.
16.8.9 Stop Execution (Ctrl-Break)
If you wish to stop the computer in the midst of executing the current command, you
may use the key sequence Ctrl-Break. Ctrl-Break does not always work with non-DOS
commands. Some software packages block its action in certain situations, but it is worth
trying before you re-boot.
16.9 The system view of MS-DOS
The systems view of MS-DOS deals with the internal organization of the operating
system. On booting the boot sector gets transferred to memory which then loads the
initialization routine (the DOS first portion) to organize the memory and then loads the next
portion of MS-DOS. Then the resident code and the transient code are loaded into the low
and high memory respectively. Interrupt handlers and system calls belong to resident code,
while command processor and the internal commands belong to transient code.
16.10 The future of MS-DOS
Early versions of Microsoft Windows were shell programs that ran in DOS. Windows
3.11 extended the shell by going into protected mode and added 32-bit support. These were
16-bit/32-bit hybrids. Microsoft Windows 95 further reduced DOS to the role of the boot
loader. Windows 98 and Windows Me were the last Microsoft OS to run on DOS. The DOSbased branch was eventually abandoned in favor of Windows NT, the first true 32-bit system
that was the foundation for Windows XP and Windows Vista.
Windows NT, initially NT OS/2 3.0, was the result of collaboration between
Microsoft and IBM to develop a 32-bit operating system that had high hardware and software
portability. Because of the success of Windows 3.0, Microsoft changed the application
programming interface to the extended Windows API, which caused a split between the two
companies and a branch in the operating system. IBM would continue to work on OS/2 and
OS/2 API, while Microsoft renamed its operating system Windows NT.
16.11 Let us sum up
In this lesson we have learnt about
a) the History of MS-DOS
b) and the various versions of MS-DOS
16.12 Points for Discussion
Try to answer the following questions
a) 5 MS-DOS commands
b) Users view of MS-DOS
16.13 Model Answers to Check your Progress
133
In order to check your progress try to answer the following questions
a)
Systems view of MS-DOS
b)
How to delete a file
c)
How to delete a directory
16.14 Lesson - end activities
After learning this chapter, try to discuss among your friends and answer these questions to
check your progress.
b) Discuss about the future of MS-DOS
c) Discuss about the various versions of MS-DOS
16.15 References
 H.M. Deitel, Chapter 19 of “Operating Systems”, Second Edition, Pearson
Education, 2001
 Andrew S. Tanenbaum, Chapter 8 of “Modern Operating Systems”, PHI, 1996
134
LESSON – 17: UNIX
CONTENTS
17.1 Aims and Objectives
17.2 Introduction
17.3 History of UNIX
17.4 Hierarchical File System
17.5 The UNIX File System Organization
17.5.1
The bin Directory
17.5.2
The dev Directory
17.5.3
The etc Directory
17.5.4
The lib Directory
17.5.5
The lost+found Directory
17.5.6
The mnt and sys Directories
17.5.7
The tmp Directory
17.5.8
The usr Directory
17.6
Other Miscellaneous Stuff at the Top Level
17.7
Let us sum up
17.8
Points for Discussion
17.9
Model Answers to Check your Progress
17.10 Lesson - end activities
17.11 References
17.1 Aims and Objectives
In this lesson we will learn about the introduction to UNIX Operating system.
The objectives of this lesson is to make the candidate aware of the following
i) History of UNIX
j) Various versions of UNIX
k) UNIX file system organization
17.2 Introduction
The Unix operating system was created more than 30 years ago by a group of
researchers at AT&T’s Bell Laboratories. During the three decades of constant development
that have followed, Unix has found a home in many places, from the ubiquitous mainframe to
home computers to the smallest of embedded devices. This lesson provides a brief overview
135
of the history of Unix, discusses some of the differences among the many Unix systems in
use today, and covers the fundamental concepts of the basic Unix operating system.
UNIX is a computer operating system, a control program that works with users to run
programs, manage resources, and communicate with other computer systems. Several people
can use a UNIX computer at the same time; hence UNIX is called a multiuser system. Any of
these users can also run multiple programs at the same time; hence UNIX is called
multitasking. Because UNIX is such a pastiche—a patchwork of development—it’s a lot
more than just an operating system. UNIX has more than 250 individual commands. These
range from simple commands—for copying a file, for example—to the quite complex: those
used in high-speed networking, file revision management, and software development. Most
notably, UNIX is a multichoice system. As an example, UNIX has three different primary
command-line-based user interfaces (in UNIX, the command-line user interface is called a
shell ): The three choices are the Bourne shell, C shell, and Korn shell. Often, soon after you
learn to accomplish a task with a particular command, you discover there’s a second or third
way to do that task. This is simultaneously the greatest strength of UNIX and a source of
frustration for both new and current users.
Why is having all this choice such a big deal? Think about why Microsoft MS-DOS
and the Apple Macintosh interfaces are considered so easy to use. Both are designed to give
the user less power. Both have dramatically fewer commands and precious little overlap in
commands: You can’t use copy to list your files in DOS, and you can’t drag a Mac file icon
around to duplicate it in its own directory. The advantage to these interfaces is that, in either
system, you can learn the one-and-only way to do a task and be confident that you’re as
sophisticated in doing that task as is the next person. It’s easy. It’s quick to learn. It’s exactly
how the experts do it, too.
UNIX, by contrast, is much more like a spoken language, with commands acting as
verbs, command options (which you learn about later in this lesson) acting as adjectives, and
the more complex commands acting akin to sentences. How you do a specific task can,
therefore, be completely different from how your UNIX-expert friend does the same task.
Worse, some specific commands in UNIX have many different versions, partly because of
the variations from different UNIX vendors. (You’ve heard of these variations and vendors,
I’ll bet: UNIXWare from Novell, Solaris from Sun, SCO from Santa Cruz, System V Release
4 (pronounce that “system five release four” or, to sound like an ace, “ess-vee-are-four”), and
BSD UNIX (pronounced “bee-ess-dee”) from University of California at Berkeley are the
primary players. Each is a little different from the other.) Another contributor to the sprawl of
modern UNIX is the energy of the UNIX programming community; plenty of UNIX users
decide to write a new version of a command in order to solve slightly different problems, thus
spawning many versions of a command.
In terms of computers, Unix has a long history. Unix was developed at AT&T’s Bell
Laboratories after Bell Labs withdrew from a long-term collaboration with General Electric
(G.E.) and MIT to create an operating system called MULTICS (Multiplexed Operating and
Computing System) for G.E.’s mainframe. In 1969, Bell Labs researchers created the first
version of Unix (then called UNICS, or Uniplexed Operating and Computing System), which
has evolved into the common Unix systems of today.
136
Unix was gradually ported to different machine architectures from the original PDP-7
minicomputer and was used by universities. The source code was made available at a small
fee to encourage its further adoption. As Unix gained acceptance by universities, students
who used it began graduating and moving into positions where they were responsible for
purchasing systems and software. When those people began purchasing systems for their
companies, they considered Unix because they were familiar with it, spreading adoption
further. Since the first days of Unix, the operating system has grown significantly, so that it
now forms the backbone of many major corporations’ computer systems.
Unix no longer is an acronym for anything, but it is derived from the UNICS
acronym. Unix developers and users use a lot of acronyms to identify things in the system and
for commands.
Unlike DOS, Windows, OS/2, the Macintosh, VMS, MVS, and just about any other
operating system, UNIX was designed by a couple of programmers as a fun project, and it
evolved through the efforts of hundreds of programmers, each of whom was exploring his or
her own ideas of particular aspects of OS design and user interaction. In this regard, UNIX is
not like other operating systems, needless to say! It all started back in the late 1960s in a dark
and stormy laboratory deep in the recesses of the American Telephone and Telegraph
(AT&T) corporate facility in New Jersey. Working with the Massachusetts Institute of
Technology, AT&T Bell Labs was codeveloping a massive, monolithic operating system
called Multics. On the Bell Labs team were Ken Thompson, Dennis Ritchie, Brian
Kernighan, and other people in the Computer Science Research Group who would prove to
be key contributors to the new UNIX operating system.
When 1969 rolled around, Bell Labs was becoming increasingly disillusioned with
Multics, an overly slow and expensive system that ran on General Electric mainframe
computers that themselves were expensive to run and rapidly becoming obsolete. The
problem was that Thompson and the group really liked the capabilities Multics offered,
particularly the individual-user environment and multiple-user aspects.
17.3 History of UNIX
In the 1960s, the Massachusetts Institute of Technology, AT&T Bell Labs, and
General Electric worked on an experimental operating system called Multics (Multiplexed
Information and Computing Service), which was designed to run on the GE-645 mainframe
computer. The aim was the creation of a commercial product, although this was never a great
success. Multics was an interactive operating system with many novel capabilities, including
enhanced security. The project did develop production releases, but initially these releases
performed poorly.
AT&T Bell Labs pulled out and deployed its resources elsewhere. One of the
developers on the Bell Labs team, Ken Thompson, continued to develop for the GE-645
mainframe, and wrote a game for that computer called Space Travel. However, he found that
the game was too slow on the GE machine and was expensive, costing $75 per execution in
scarce computing time.
137
Thompson thus re-wrote the game in assembly language for Digital Equipment
Corporation's PDP-7 with help from Dennis Ritchie. This experience, combined with his
work on the Multics project, led Thompson to start a new operating system for the PDP-7.
Thompson and Ritchie led a team of developers, including Rudd Canady, at Bell Labs
developing a file system as well as the new multi-tasking operating system itself. They
included a command line interpreter and some small utility programs.
Editing a shell script using the ed editor. The dollar-sign at the top of the screen is the
prompt printed by the shell. 'ed' is typed to start the editor, which takes over from that point
on the screen downwards.
1970s
In 1970 the project was named Unics, and could - eventually - support two
simultaneous users. Brian Kernighan invented this name as a contrast to Multics; the spelling
was later changed to UNIX.
Up until this point there had been no financial support from Bell Labs. When the
Computer Science Research Group wanted to use UNIX on a much larger machine than the
PDP-7, Thompson and Ritchie managed to trade the promise of adding text processing
capabilities to UNIX for a PDP-11/20 machine. This led to some financial support from Bell.
For the first time in 1970, the UNIX operating system was officially named and ran on the
PDP-11/20. It added a text formatting program called roff and a text editor. All three were
written in PDP-11/20 assembly language. Bell Labs used this initial "text processing system",
made up of UNIX, roff, and the editor, for text processing of patent applications. Roff soon
evolved into troff, the first electronic publishing program with a full typesetting capability.
1973s
In 1973, the decision was made to re-write UNIX in the C programming language.
The change meant that it was easier to modify UNIX to work on other machines (thus
becoming portable), and other developers could create variations. The code was now more
concise and compact, leading to accelerated development of UNIX. AT&T made UNIX
available to universities and commercial firms, as well as the United States government under
licenses. The licenses included all source code including the machine-dependent parts of the
kernel, which were written in PDP-11 assembly code.
1975s
Versions of the UNIX system were determined by editions of its user manuals, so that
(for example) "Fifth Edition UNIX" and "UNIX Version 5" have both been used to designate
the same thing. Development expanded, with Versions 4, 5, and 6 being released by 1975.
These versions added the concept of pipes, leading to the development of a more modular
code-base, increasing development speed still further. Version 5 and especially Version 6 led
to a plethora of different Unix versions both inside and outside Bell Labs, including
PWB/UNIX, IS/1 (the first commercial Unix), and the University of Wollongong's port to the
Interdata 7/32 (the first non-PDP Unix).
138
1978s
In 1978, UNIX/32V, for the VAX system, was released. By this time, over 600
machines were running UNIX in some form.
1979s
Version 7 UNIX, the last version of Research Unix to be released widely, was
released in 1979. Versions 8, 9 and 10 were developed through the 1980s but were only
released to a few universities, though they did generate papers describing the new work. This
research led to the development of Plan 9 from Bell Labs, a new portable distributed system.
1980s
A late-80s style UNIX desktop running the X Window System graphical user interface.
Shown are a number of client applications common to the MIT X Consortium's distribution,
including Tom's Window Manager, an X Terminal, Xbiff, xload, and a graphical manual
page browser.
AT&T now licensed UNIX System III, based largely on Version 7, for commercial
use, the first version launching in 1982. This also included support for the VAX. AT&T
continued to issue licenses for older UNIX versions. To end the confusion between all its
differing internal versions, AT&T combined them into UNIX System V Release 1. This
introduced a few features such as the vi editor and curses from the Berkeley Software
Distribution of Unix developed at the University of California, Berkeley. This also included
support for the Western Electric 3B series of machines.
Since the newer commercial UNIX licensing terms were not as favorable for
academic use as the older versions of Unix, the Berkeley researchers continued to develop
BSD Unix as an alternative to UNIX System III and V, originally on the PDP-11 architecture
(the 2.xBSD releases, ending with 2.11BSD) and later for the VAX-11 (the 4.x BSD
releases). Many contributions to UNIX first appeared on BSD systems, notably the C shell
with job control (modeled on ITS). Perhaps the most important aspect of the BSD
development effort was the addition of TCP/IP network code to the mainstream UNIX kernel.
The BSD effort produced several significant releases that contained network code: 4.1cBSD,
4.2BSD, 4.3BSD, 4.3BSD-Tahoe ("Tahoe" being the nickname of the CCI Power 6/32
architecture that was the first non-DEC release of the BSD kernel), Net/1, 4.3BSD-Reno (to
match the "Tahoe" naming, and that the release was something of a gamble), Net/2, 4.4BSD,
and 4.4BSD-lite. The network code found in these releases is the ancestor of much TCP/IP
network code in use today, including code that was later released in AT&T System V UNIX
and early versions of Microsoft Windows. The accompanying Berkeley Sockets API is a de
facto standard for networking APIs and has been copied on many platforms.
1982s
Other companies began to offer commercial versions of the UNIX System for their
own mini-computers and workstations. Most of these new UNIX flavors were developed
from the System V base under a license from AT&T; however, others were based on BSD
instead. One of the leading developers of BSD, Bill Joy, went on to co-found Sun
139
Microsystems in 1982 and create SunOS (now Solaris) for their workstation computers. In
1980, Microsoft announced its first Unix for 16-bit microcomputers called Xenix, which the
Santa Cruz Operation (SCO) ported to the Intel 8086 processor in 1983, and eventually
branched Xenix into SCO UNIX in 1989.
For a few years during this period (before PC compatible computers with MS-DOS
became dominant), industry observers expected that UNIX, with its portability and rich
capabilities, was likely to become the industry standard operating system for
microcomputers.
1984s
In 1984 several companies established the X/Open consortium with the goal of
creating an open system specification based on UNIX. Despite early progress, the
standardization effort collapsed into the "Unix wars," with various companies forming rival
standardization groups. The most successful Unix-related standard turned out to be the
IEEE's POSIX specification, designed as a compromise API readily implemented on both
BSD and System V platforms, published in 1988 and soon mandated by the United States
government for many of its own systems.
Between 1987 - 1989
AT&T added various features into UNIX System V, such as file locking, system
administration, streams, new forms of IPC, the Remote File System and TLI. AT&T
cooperated with Sun Microsystems and between 1987 and 1989 merged features from Xenix,
BSD, SunOS, and System V into System V Release 4 (SVR4), independently of X/Open.
This new release consolidated all the previous features into one package, and heralded the
end of competing versions. It also increased licensing fees.
The Common Desktop Environment or CDE, a graphical desktop for UNIX codeveloped in the 1990s by HP, IBM, and Sun as part of the COSE initiative.
1990s
In 1990, the Open Software Foundation released OSF/1, their standard UNIX
implementation, based on Mach and BSD. The Foundation was started in 1988 and was
funded by several Unix-related companies that wished to counteract the collaboration of
AT&T and Sun on SVR4. Subsequently, AT&T and another group of licensees formed the
group "UNIX International" in order to counteract OSF. This escalation of conflict between
competing vendors gave rise again to the phrase "Unix wars".
1991s
In 1991, a group of BSD developers (Donn Seeley, Mike Karels, Bill Jolitz, and Trent
Hein) left the University of California to found Berkeley Software Design, Inc (BSDI). BSDI
produced a fully functional commercial version of BSD Unix for the inexpensive and
ubiquitous Intel platform, which started a wave of interest in the use of inexpensive hardware
for production computing. Shortly after it was founded, Bill Jolitz left BSDI to pursue
distribution of 386BSD, the free software ancestor of FreeBSD, OpenBSD, and NetBSD.
140
1993s
In 1993 most commercial vendors had changed their variants of UNIX to be based on
System V with many BSD features added on top. The creation of the COSE initiative that
year by the major players in UNIX marked the end of the most notorious phase of the UNIX
wars, and was followed by the merger of UI and OSF in 1994. The new combined entity,
which retained the OSF name, stopped work on OSF/1 that year. By that time the only vendor
using it was Digital, which continued its own development, rebranding their product Digital
UNIX in early 1995.
Shortly after UNIX System V Release 4 was produced, AT&T sold all its rights to
UNIX® to Novell. (Dennis Ritchie likened this to the Biblical story of Esau selling his
birthright for the proverbial "mess of pottage".) Novell developed its own version, UnixWare,
merging its NetWare with UNIX System V Release 4. Novell tried to use this to battle
against Windows NT, but their core markets suffered considerably.
In 1993, Novell decided to transfer the UNIX® trademark and certification rights to
the X/Open Consortium. In 1996, X/Open merged with OSF, creating the Open Group.
Various standards by the Open Group now define what is and what is not a "UNIX" operating
system, notably the post-1998 Single UNIX Specification.
1995s
In 1995, the business of administering and supporting the existing UNIX licenses,
plus rights to further develop the System V code base, were sold by Novell to the Santa Cruz
Operation. Whether Novell also sold the copyrights is currently the subject of litigation (see
below).
1997s
In 1997, Apple Computer sought out a new foundation for its Macintosh operating
system and chose NEXTSTEP, an operating system developed by NeXT. The core operating
system was renamed Darwin after Apple acquired it. It was based on the BSD family and the
Mach kernel. The deployment of Darwin BSD Unix in Mac OS X makes it, according to a
statement made by an Apple employee at a USENIX conference; the most widely used Unixbased system in the desktop computer market.
2000 to present
In 2000, SCO sold its entire UNIX business and assets to Caldera Systems, which
later on changed its name to The SCO Group. This new player then started legal action
against various users and vendors of Linux. SCO have alleged that Linux contains
copyrighted UNIX code now owned by The SCO Group. Other allegations include tradesecret violations by IBM, or contract violations by former Santa Cruz customers who have
since converted to Linux. However, Novell disputed the SCO group's claim to hold copyright
on the UNIX source base. According to Novell, SCO (and hence the SCO Group) are
effectively franchise operators for Novell, which also retained the core copyrights, veto rights
over future licensing activities of SCO, and 95% of the licensing revenue. The SCO Group
disagreed with this, and the dispute had resulted in the SCO v. Novell lawsuit.
141
In 2005, Sun Microsystems released the bulk of its Solaris system code (based on
UNIX System V Release 4) into an open source project called OpenSolaris. New Sun OS
technologies such as the ZFS file system are now first released as open source code via the
OpenSolaris project; as of 2006 it has spawned several non-Sun distributions such as
SchilliX,Belenix, Nexenta and MarTux.
The Dot-com crash has led to significant consolidation of UNIX users as well. Of the
many commercial flavors of UNIX that were born in the 1980s, only Solaris, HP-UX, and
AIX are still doing relatively well in the market, though SGI's IRIX persisted for quite some
time. Of these, Solaris has the most market share, and may be gaining popularity due to its
feature set and also since it now has an Open Source version.
17.4 Hierarchical File System
In a nutshell, a hierarchy is a system organized by graded categorization. A familiar
example is the organizational structure of a company, where workers report to supervisors
and supervisors report to middle managers. Middle managers, in turn, report to senior
managers, and senior managers report to vice-presidents, who report to the president of the
company. Graphically, this hierarchy looks like Figure 1.1.
Figure 1.1. A typical organizational hierarchy
You’ve doubtless seen this type of illustration before, and you know that a higher
position indicates more control. Each position is controlled by the next highest position or
row. The president is top dog of the organization, but each subsequent manager is also in
control of his or her own small fiefdom.
To understand how a file system can have a similar organization, simply imagine each
of the managers in the illustration as a “file folder” and each of the employees as a piece of
paper, filed in a particular folder. Open any file cabinet, and you probably see things
organized this
142
Moving About in the File System way: filed papers are placed in labeled folders, and
often these folders are filed in groups under specific topics. The drawer might then have a
specific label to distinguish it from other drawers in the cabinet, and so on.
That’s exactly what a hierarchical file system is all about. You want to have your files
located in the most appropriate place in the file system, whether at the very top, in a folder, or
in a nested series of folders. With careful usage, a hierarchical file system can contain
hundreds or thousands of files and still allow users to find any individual file quickly. On my
computer, the chapters of this book are organized in a hierarchical fashion, as shown in
Figure 1.2.
Figure 1.2. File organization for the chapters of Teach
17.5 The UNIX File System Organization
A key concept enabling the UNIX hierarchical file system to be so effective is that
any thing that is not a folder is a file. Programs are files in UNIX, device drivers are files,
documents and spreadsheets are files, your keyboard is represented as a file, your display is a
file, and even your try line and mouse are files.
What this means is that as UNIX has developed, it has avoided becoming an ungainly
mess. UNIX does not have hundreds of cryptic files stuck at the top (this is still a problem in
DOS)or tucked away in confusing folders within the System Folder (as with the
Macintosh).The top level of the UNIX file structure (/) is known as the root directory or slash
directory, and it always has a certain set of subdirectories, including bin, dev, etc, lib, mnt,
tmp, and usr. There can be a lot more, however. Listing 1.1 shows files found at the top level
of the mentor file system (the system I work on). Typical UNIX directories are shown
followed by a slash in the listing.
AA boot flags/ rf/ userb/ var/
OLD/ core gendynix stand/ userc/
archive/ dev/ lib/ sys/ users/
ats/ diag/ lost+found/ tftpboot/ usere/
backup/ dynix mnt/ tmp/ users/
bin/ etc/ net/ usera/ usr/
143
You can obtain a listing of the files and directories in your own top-level directory by
using the ls –C -F / command. (You’ll learn all about the ls command in the next hour. For
now, just be sure that you enter exactly what’s shown in the example.) On a different
computer system, here’s what I see when I enter that command:
% ls –C -F /
Mail/ export/ public/
News/ home/ reviews/
add_swap/ kadb* sbin/
apps/ layout sys@
archives/ lib@ tftpboot/
bin@ lost+found/ tmp/
boot mnt/ usr/
cdrom/ net/ utilities/
chess/ news/ var/
dev/ nntpserver vmunix*
etc/ pcfs/
In this example, any filename that ends with a slash (/) is a folder (UNIX calls these
directories). Any filename that ends with an asterisk (*) is a program. Anything ending with
an at sign (@) is a symbolic link, and everything else is a normal, plain file. As you can see
from these two examples, and as you’ll immediately find when you try the command
yourself, there is much variation in how different UNIX systems organize the top level
directory. There are some directories and files in common, and once you start examining the
contents of specific directories, you’ll find that hundreds of programs and files always show
up in the same place from UNIX to UNIX. It’s as if you were working as a file clerk at a new
law firm. Although this firm might have a specific approach to filing information, the
approach may be similar to the filing system of other firms where you have worked in the
past. If you know the underlying organization, you can quickly pick up the specifics of a
particular organization. Try the command ls –C -F / on your computer system, and identify,
as previously explained, each of the directories in your resultant listing. The output of the ls
command shows the files and directories in the top level of your system. Next, you learn what
they are.
17.5.1 The bin Directory
In UNIX parlance, programs are considered executables because users can execute
them. (In this case, execute is a synonym for run, not an indication that you get to wander
about murdering innocent applications!) When the program has been compiled (usually from
a C listing), it is translated into what’s called a binary format. Add the two together, and you
have a common UNIX description for an application—an executable binary.
3
It’s no surprise that the original UNIX developers decided to have a directory labeled
“binaries” to store all the executable programs on the system. Remember the primitive
teletypewriter discussed in the last hour? Having a slow system to talk with the computer had
many ramifications that you might not expect. The single most obvious one was that
everything became quite concise. There were no lengthy words like binaries or listfiles, but
rather succinct abbreviations: bin and ls are, respectively, the UNIX equivalents. The bin
directory is where all the executable binaries were kept in early UNIX. Over time, as more
144
and more executables were added to UNIX, having all the executables in one place proved
unmanageable, and the bin directory split into multiple parts (/bin, /sbin, /usr/bin).
17.5.2 The dev Directory
Among the most important portions of any computer are its device drivers. Without
them, you wouldn’t have any information on your screen (the information arrives courtesy of
the display device driver). You wouldn’t be able to enter information (the information is read
and given to the system by the keyboard device driver), and you wouldn’t be able to use your
floppy disk drive (managed by the floppy device driver).
Earlier, you learned how almost anything in UNIX is considered a file in the file
system, and the dev directory is an example. All device drivers—often numbering into the
hundreds— are stored as separate files in the standard UNIX dev (devices) directory.
Pronounce this directory name “dev,” not “dee-ee-vee.”
17.5.3 The etc Directory
UNIX administration can be quite complex, involving management of user accounts,
the file system, security, device drivers, hardware configurations, and more. To help, UNIX
designates the etc directory as the storage place for all administrative files and information.
Pronounce the directory name either “ee-tea-sea”, “et-sea,” or “etcetera.” All three
pronunciations are common.
17.5.4 The lib Directory
Like your neighborhood community, UNIX has a central storage place for function
and procedural libraries. These specific executables are included with specific programs,
allowing programs to offer features and capabilities otherwise unavailable. The idea is that if
programs want to include certain features, they can reference just the shared copy of that
utility in the UNIX library rather than having a new, unique copy.
Many of the more recent UNIX systems also support what’s called dynamic linking,
where the library of functions is included on-the-fly as you start up the program. The wrinkle
is that instead of the library reference being resolved when the program is created, it’s
resolved only when you actually run the program itself. Pronounce the directory name “libe”
or “lib” (to rhyme with the word bib).
17.5.5 The lost+found Directory
With multiple users running many different programs simultaneously, it’s been a
challenge over the years to develop a file system that can remain synchronized with the
activity of the computer. Various parts of the UNIX kernel—the brains of the system—help
with this problem. When files are recovered after any sort of problem or failure, they are
placed here, in the lost+found directory, if the kernel cannot ascertain the proper location in
the filesystem. This directory should be empty almost all the time. This directory is
commonly pronounced “lost and found” rather than “lost plus found.”
17.5.6 The mnt and sys Directories
145
The mnt (pronounced “em-en-tea”) and sys (pronounced “sis”) directories also are
safely ignored by UNIX users. The mnt directory is intended to be a common place to mount
external media—hard disks, removable cartridge drives, and so on—in UNIX. On many
systems, though not all, sys contains files indicating the system configuration.
17.5.7 The tmp Directory
A directory that you can’t ignore, the tmp directory—say “temp”—is used by many of
the programs in UNIX as a temporary file-storage space. If you’re editing a file, for example,
the program makes a copy of the file and saves it in tmp, and you work directly with that,
saving the new file back to your original file only when you’ve completed your work. On
most systems, tmp ends up littered with various files and executables left by programs that
don’t remove their own temporary files. On one system I use, it’s not uncommon to find 10–
30 megabytes of files wasting space here. Even so, if you’re manipulating files or working
with copies of files, tmp is the best place to keep the temporary copies of files. Indeed, on
some UNIX workstations, tmp actually can be the fastest device on the computer, allowing
for dramatic performance improvements over working with files directly in your home
directory.
17.5.8 The usr Directory
Finally, the last of the standard directories at the top level of the UNIX file system
hierarchy is the usr—pronounced “user”—directory. Originally, this directory was intended
to be the central storage place for all user-related commands. Today, however, many
companies have their own interpretation, and there’s no telling what you’ll find in this
directory.
3
17.6 Other Miscellaneous Stuff at the Top Level
Besides all the directories previously listed, a number of other directories and files
commonly occur in UNIX systems. Some files might have slight variations in name on your
computer, so when you compare your listing to the following files and directories, be alert for
possible alternative spellings.
A file you must have to bring up UNIX at all is one usually called unix or vmunix, or
named after the specific version of UNIX on the computer. The file contains the actual UNIX
operating system. The file must have a specific name and must be found at the top level of
the file system. Hand-in-hand with the operating system is another file called boot, which
helps during initial startup of the hardware.
Notice on one of the previous listings that the files boot and dynix appear. (DYNIX is
the name of the particular variant of UNIX used on Sequent computers.) By comparison, the
listing from the Sun Microsystems workstation shows boot and vmunix as the two files.
Another directory that you might find in your own top-level listing is diag—pronounced
“dye-ag”—which acts as a storehouse for diagnostic and maintenance programs. If you have
any programs within this directory, it’s best not to try them out without proper training! The
home directory, also sometimes called users, is a central place for organizing all files unique
to a specific user. Listing this directory is usually an easy way to find out what accounts are
on the system, too, because by convention each individual account directory is named after
146
the user’s account name. On one system I use, my account is taylor, and my individual
account directory is also called taylor. Home directories are always created by the system
administrator. The net directory, if set up correctly, is a handy shortcut for accessing other
computers on your network. The tftpboot directory is a relatively new feature of UNIX. The
letters stand for “trivial file Transfer protocol boot.” Don’t let the name confuse you, though;
this directory contains Versions of the kernel suitable for X Window System-based terminals
and diskless workstations to run UNIX. Some UNIX systems have directories named for
specific types of peripherals that can be attached. On the Sun workstation, you can see
examples with the directories cd rom and pcfs.The former is for a CD-ROM drive and the
latter for DOS-format floppy disks. There are many more directories in UNIX, but this will
give you an idea of how things are organized.
17.7 Let us sum up
In this lesson we have learnt about
a) The History of UNIX
b) And UNIX file system organization
17.8 Points for Discussion
Try to discuss about the following
a) Evolution of UNIX
b) Future of UNIX
17.9 Model Answers to Check your Progress
In order to check your progress, try to answer the uses of the following directories
a) bin directory
b) dev directory
c) lib directory
17.10 Lesson - end activities
After learning this chapter, try to discuss among your friends and answer these questions
to check your progress.
b) Discuss about UNIX file system organization
c) Discuss about various version of UNIX
17.11 References
 H.M. Deitel, Chapter 18 of “Operating Systems”, Second Edition, Pearson
Education, 2001
 Andrew S. Tanenbaum, Chapter 7 of “Modern Operating Systems”, PHI, 1996
147
LESSON – 18: KERNEL AND SHELL
CONTENTS
18.1 Aims and Objectives
18.2 Introduction
18.2.1
The kernel
18.2.2
The shell
18.3 Simple commands
18.4 Background commands
18.5 Input output redirection
18.6 Pipelines and filters
18.7 File name generation
18.8 Quoting
18.9 Prompting
18.10
Shell procedures
18.11
Control Flow
18.12
Shell variables
18.13 The test command
18.14 Other Control flows
18.15 Command grouping
18.16 Debugging shell procedures
18.17 Other important commands
18.18 Let us sum up
18.19
Points for Discussion
18.20 Model Answers to Check your Progress
18.21 Lesson - end activities
18.22 References
18.1 Aims and Objectives
In this lesson we will learn about the introduction to Kernel and Shell.
The objectives of this lesson is to make the candidate aware of the following
l) Kernel
m) Shell
n) And some important shell commands
148
18.2 Introduction
In this section we discuss about kernel and shell.
18.2.1 The kernel
The kernel of UNIX is the hub of the operating system: it allocates time and memory
to programs and handles the filestore and communications in response to system calls. The
kernel is responsible for carrying out all fundamental low-level system operations, like
scheduling processes, opening and closing files, and sending instructions to the actual
hardware CPU chips that process your data. There is only one kernel and it is the heart of the
machine.
As an illustration of the way that the shell and the kernel work together, suppose a
user types rm myfile (which has the effect of removing the file myfile). The shell searches
the filestore for the file containing the program rm, and then requests the kernel, through
system calls, to execute the program rm on myfile. When the process rm myfile has finished
running, the shell then returns the UNIX prompt % to the user, indicating that it is waiting for
further commands.
18.2.2 The shell
The shell acts as an interface between the user and the kernel. When a user logs in, the
login program checks the username and password, and then starts another program called the
shell. The shell is a command line interpreter (CLI). It interprets the commands the user types
in and arranges for them to be carried out. The commands are themselves programs: when
they terminate, the shell gives the user another prompt (% on our systems).
The adept user can customise his/her own shell, and users can use different shells on
the same machine. Staff and students in the school have the tcsh shell by default.
The tcsh shell has certain features to help the user inputting commands.
Filename Completion - By typing part of the name of a command, filename or directory and
pressing the [Tab] key, the tcsh shell will complete the rest of the name automatically. If the
shell finds more than one name beginning with those letters you have typed, it will beep,
prompting you to type a few more letters before pressing the tab key again.
History - The shell keeps a list of the commands you have typed in. If you need to repeat a
command, use the cursor keys to scroll up and down the list or type history for a list of
previous commands.
The shell is both a command language and a programming language that provides an
interface to the UNIX operating system.
18.3 Simple commands
149
Simple commands consist of one or more words separated by blanks. The first word is
the name of the command to be executed; any remaining words are passed as arguments to
the command. For example,
who
is a command that prints the names of users logged in. The command
ls –l
prints a list of files in the current directory. The argument -l tells ls to print status
information, size and the creation date for each file.
18.4 Background commands
To execute a command the shell normally creates a new process and waits for it to
finish. A command may be run without waiting for it to finish. For example,
cc pgm.c &
calls the C compiler to compile the file pgm.c. The trailing & is an operator that instructs the
shell not to wait for the command to finish. To help keep track of such a process the shell
reports its process number following its creation. A list of currently active processes may be
obtained using the ps command.
18.5 Input output redirection
Most commands produce output on the standard output that is initially connected to
the terminal. This output may be sent to a file by writing, for example,
ls -l >file
The notation >file is interpreted by the shell and is not passed as an argument to ls. If
file does not exist then the shell creates it; otherwise the original contents of file are replaced
with the output from ls. Output may be appended to a file using the notation
ls -l >>file
In this case file is also created if it does not already exist.
The standard input of a command may be taken from a file instead of the terminal by
writing, for example,
wc <file
The command wc reads its standard input (in this case redirected from file) and prints
the number of characters, words and lines found. If only the number of lines is required then
wc -l <file
could be used.
150
18.6 Pipelines and filters
The standard output of one command may be connected to the standard input of
another by writing the `pipe' operator, indicated by |, as in,
ls -l | wc
Two commands connected in this way constitute a pipeline and the overall effect is
the same as
ls -l >file; wc <file
except that no file is used. Instead the two processes are connected by a pipe and are run in
parallel.
Pipes are unidirectional and synchronization is achieved by halting wc when there is
nothing to read and halting ls when the pipe is full.
A filter is a command that reads its standard input, transforms it in some way, and
prints the result as output. One such filter, grep, selects from its input those lines that contain
some specified string. For example,
ls | grep old
prints those lines, if any, of the output from ls that contain the string old. Another useful filter
is sort. For example,
who | sort
will print an alphabetically sorted list of logged in users.
A pipeline may consist of more than two commands, for example,
ls | grep old | wc -l
prints the number of file names in the current directory containing the string old.
18.7 File name generation
Many commands accept arguments which are file names. For example,
ls -l main.c
prints information relating to the file main.c.
The shell provides a mechanism for generating a list of file names that match a
pattern. For example,
ls -l *.c
151
generates, as arguments to ls, all file names in the current directory that end in .c. The
character * is a pattern that will match any string including the null string. In general patterns
are specified as follows.
*
Matches any string of characters including the null string.
?
Matches any single character.
[...]
Matches any one of the characters enclosed. A pair of characters separated by a minus
will match any character lexically between the pair.
For example,
[a-z]*
matches all names in the current directory beginning with one of the letters a through z.
/usr/fred/test/?
matches all names in the directory /usr/fred/test that consist of a single character. If no file
name is found that matches the pattern then the pattern is passed, unchanged, as an argument.
This mechanism is useful both to save typing and to select names according to some
pattern. It may also be used to find files. For example,
echo /usr/fred/*/core
finds and prints the names of all core files in sub-directories of /usr/fred. (echo is a standard
UNIX command that prints its arguments, separated by blanks.) This last feature can be
expensive, requiring a scan of all sub-directories of /usr/fred.
There is one exception to the general rules given for patterns. The character `.' at the
start of a file name must be explicitly matched.
echo *
will therefore echo all file names in the current directory not beginning with `.'.
echo .*
will echo all those file names that begin with `.'. This avoids inadvertent matching of the
names `.' and `..' which mean `the current directory' and `the parent directory' respectively.
(Notice that ls suppresses information for the files `.' and `..'.)
18.8 Quoting
Characters that have a special meaning to the shell, such as < > * ? | &, are called
metacharacters. Any character preceded by a \ is quoted and loses its special meaning, if any.
The \ is elided so that
152
echo \?
will echo a single ?, and
echo \\
will echo a single \. To allow long strings to be continued over more than one line the
sequence \newline is ignored.
\ is convenient for quoting single characters. When more than one character needs quoting the
above mechanism is clumsy and error prone. A string of characters may be quoted by
enclosing the string between single quotes. For example,
echo xx'****'xx
will echo
xx****xx
The quoted string may not contain a single quote but may contain newlines, which are
preserved. This quoting mechanism is the most simple and is recommended for casual use. A
third quoting mechanism using double quotes is also available that prevents interpretation of
some but not all metacharacters.
153
18.9 Prompting
When the shell is used from a terminal it will issue a prompt before reading a
command. By default this prompt is `$ '. It may be changed by saying, for example,
PS1=yesdear
that sets the prompt to be the string yesdear. If a newline is typed and further input is needed
then the shell will issue the prompt `> '. Sometimes this can be caused by mistyping a quote
mark. If it is unexpected then an interrupt (DEL) will return the shell to read another
command. This prompt may be changed by saying, for example,
PS2=more
The shell and login
Following login (1) the shell is called to read and execute commands typed at the
terminal. If the user's login directory contains the file .profile then it is assumed to contain
commands and is read by the shell before reading any commands from the terminal.
18.10 Shell procedures
The shell may be used to read and execute commands contained in a file. For
example,
sh file [ args ... ]
calls the shell to read commands from file. Such a file is called a command procedure or shell
procedure. Arguments may be supplied with the call and are referred to in file using the
positional parameters $1, $2, .... For example, if the file wg contains
who | grep $1
then
sh wg fred
is equivalent to
who | grep fred
UNIX files have three independent attributes, read, write and execute. The UNIX
command chmod (1) may be used to make a file executable. For example,
chmod +x wg
will ensure that the file wg has execute status. Following this, the command
wg fred
is equivalent to
sh wg fred
154
This allows shell procedures and programs to be used interchangeably. In either case a
new process is created to run the command.
As well as providing names for the positional parameters, the number of positional
parameters in the call is available as $#. The name of the file being executed is available as
$0.
A special shell parameter $* is used to substitute for all positional parameters except $0. A
typical use of this is to provide some default arguments, as in,
nroff -T450 -ms $*
which simply prepends some arguments to those already given.
18.11 Control Flow
Control flow - for
A frequent use of shell procedures is to loop through the arguments ($1, $2, ...)
executing commands once for each argument. An example of such a procedure is tel that
searches the file /usr/lib/telnos that contains lines of the form
...
fred mh0123
bert mh0789
...
The text of tel is
for i
do grep $i /usr/lib/telnos; done
The command
tel fred
prints those lines in /usr/lib/telnos that contain the string fred.
tel fred bert
prints those lines containing fred followed by those for bert.
The for loop notation is recognized by the shell and has the general form
for name in w1 w2 ...
do command-list
done
A command-list is a sequence of one or more simple commands separated or
terminated by a newline or semicolon. Furthermore, reserved words like do and done are
only recognized following a newline or semicolon. name is a shell variable that is set to the
words w1 w2 ... in turn each time the command-list following do is executed. If in w1 w2 ... is
155
omitted then the loop is executed once for each positional parameter; that is, in $* is
assumed.
Another example of the use of the for loop is the create command whose text is
for i do >$i; done
The command
create alpha beta
ensures that two empty files alpha and beta exist and are empty. The notation >file may be
used on its own to create or clear the contents of a file. Notice also that a semicolon (or
newline) is required before done.
Control flow - case
A multiple way branch is provided for by the case notation. For example,
case $# in
1) cat >>$1 ;;
2) cat >>$2 <$1 ;;
*) echo \'usage: append [ from ] to\' ;;
esac
is an append command. When called with one argument as
append file
$# is the string 1 and the standard input is copied onto the end of file using the cat command.
append file1 file2
appends the contents of file1 onto file2. If the number of arguments supplied to append is
other than 1 or 2 then a message is printed indicating proper usage.
The general form of the case command is
case word in
pattern) command-list;;
...
esac
The shell attempts to match word with each pattern, in the order in which the patterns
appear. If a match is found the associated command-list is executed and execution of the case
is complete. Since * is the pattern that matches any string it can be used for the default case.
A word of caution: no check is made to ensure that only one pattern matches the case
argument. The first match found defines the set of commands to be executed. In the example
below the commands following the second * will never be executed.
156
case $# in
*) ... ;;
*) ... ;;
esac
Another example of the use of the case construction is to distinguish between
different forms of an argument. The following example is a fragment of a cc command.
for i
do case $i in
-[ocs])
... ;;
-*) echo \'unknown flag $i\' ;;
*.c) /lib/c0 $i ... ;;
*) echo \'unexpected argument $i\' ;;
esac
done
To allow the same commands to be associated with more than one pattern the case
command provides for alternative patterns separated by a |. For example,
case $i in
-x|-y) ...
esac
is equivalent to
case $i in
-[xy]) ...
Esac
The usual quoting conventions apply so that
case $i in
\?)
...
will match the character ?.
18.12 Shell variables
The shell provides string-valued variables. Variable names begin with a letter and
consist of letters, digits and underscores. Variables may be given values by writing, for
example,
user=fred box=m000 acct=mh0000
which assigns values to the variables user, box and acct. A variable may be set to the null
string by saying, for example,
null=
The value of a variable is substituted by preceding its name with $; for example,
echo $user
will echo fred.
Variables may be used interactively to provide abbreviations for frequently used
strings. For example,
157
b=/usr/fred/bin
mv pgm $b
will move the file pgm from the current directory to the directory /usr/fred/bin. A more
general notation is available for parameter (or variable) substitution, as in,
echo ${user}
which is equivalent to
echo $user
and is used when the parameter name is followed by a letter or digit. For example,
tmp=/tmp/ps
ps a >${tmp}a
will direct the output of ps to the file /tmp/psa, whereas,
ps a >$tmpa
would cause the value of the variable tmpa to be substituted.
Except for $? the following are set initially by the shell. $? is set after executing each
command.
$?
The exit status (return code) of the last command executed as a decimal string. Most
commands return a zero exit status if they complete successfully, otherwise a nonzero exit status is returned. Testing the value of return codes is dealt with later under
if and while commands.
$#
The number of positional parameters (in decimal). Used, for example, in the append
command to check the number of parameters.
$$
The process number of this shell (in decimal). Since process numbers are unique
among all existing processes, this string is frequently used to generate unique
temporary file names. For example,
ps a >/tmp/ps$$
...
rm /tmp/ps$$
$!
The process number of the last process run in the background (in decimal).
$The current shell flags, such as -x and -v.
Some variables have a special meaning to the shell and should be avoided for general use.
$MAIL
When used interactively the shell looks at the file specified by this variable before it
issues a prompt. If the specified file has been modified since it was last looked at the
shell prints the message you have mail before prompting for the next command. This
variable is typically set in the file .profile, in the user's login directory. For example,
MAIL=/usr/mail/fred
$HOME
The default argument for the cd command. The current directory is used to resolve
file name references that do not begin with a /, and is changed using the cd command.
For example,
cd /usr/fred/bin
makes the current directory /usr/fred/bin.
cat wn
158
will print on the terminal the file wn in this directory. The command cd with no
argument is equivalent to
cd $HOME
This variable is also typically set in the the user's login profile.
$PATH
A list of directories that contain commands (the search path). Each time a command
is executed by the shell a list of directories is searched for an executable file. If
$PATH is not set then the current directory, /bin, and /usr/bin are searched by
default. Otherwise $PATH consists of directory names separated by :. For example,
PATH=:/usr/fred/bin:/bin:/usr/bin
specifies that the current directory (the null string before the first :), /usr/fred/bin,
/bin and /usr/bin are to be searched in that order. In this way individual users can
have their own `private' commands that are accessible independently of the current
directory. If the command name contains a / then this directory search is not used; a
single attempt is made to execute the command.
$PS1
The primary shell prompt string, by default, `$ '.
$PS2
The shell prompt when further input is needed, by default, `> '.
$IFS
The set of characters used by blank interpretation (see section 3.4).
18.13 The test command
The test command, although not part of the shell, is intended for use by shell
programs. For example,
test -f file
returns zero exit status if file exists and non-zero exit status otherwise. In general test
evaluates a predicate and returns the result as its exit status. Some of the more frequently
used test arguments are given here, see test (1) for a complete specification.
test s
true if the argument s is not the null string
test -f file
true if file exists
test -r file
true if file is readable
test -w file
true if file is writable
test -d file
true if file is a directory
18.14 Other Control flows
Control flow - while
The actions of the for loop and the case branch are determined by data available to
the shell. A while or until loop and an if then else branch are also provided whose actions
are determined by the exit status returned by commands. A while loop has the general form
while command-list1
do command-list2
159
done
The value tested by the while command is the exit status of the last simple command
following while. Each time round the loop command-list1 is executed; if a zero exit status is
returned then command-list2 is executed; otherwise, the loop terminates. For example,
while test $1
do ...
shift
done
is equivalent to
for i
do ...
done
shift is a shell command that renames the positional parameters $2, $3, ... as $1, $2, ... and
loses $1.
Another kind of use for the while/until loop is to wait until some external event
occurs and then run some commands. In an until loop the termination condition is reversed.
For example,
until test -f file
do sleep 300; done
commands
will loop until file exists. Each time round the loop it waits for 5 minutes before trying again.
(Presumably another process will eventually create the file.)
Control flow - if
Also available is a general conditional branch of the form,
if command-list
then command-list
else command-list
fi
that tests the value returned by the last simple command following if.
The if command may be used in conjunction with the test command to test for the
existence of a file as in
if test -f file
then process file
else do something else
fi
An example of the use of if, case and for constructions is given in section 2.10.
A multiple test if command of the form
if ...
then ...
else if ...
then
...
160
else
if ...
...
fi
fi
fi
may be written using an extension of the if notation as,
if ...
then ...
elif ...
then ...
elif ...
...
fi
The following example is the touch command which changes the `last modified' time
for a list of files. The command may be used in conjunction with make (1) to force
recompilation of a list of files.
flag=
for i
do case $i in
-c) flag=N ;;
*) if test -f $i
then ln $i junk$$; rm junk$$
elif test $flag
then echo file \'$i\' does not exist
else >$i
fi
esac
done
The -c flag is used in this command to force subsequent files to be created if they do
not already exist. Otherwise, if the file does not exist, an error message is printed. The shell
variable flag is set to some non-null string if the -c argument is encountered. The commands
ln ...; rm ...
make a link to the file and then remove it thus causing the last modified date to be updated.
The sequence
if command1
then command2
fi
may be written
command1 && command2
Conversely,
command1 || command2
executes command2 only if command1 fails. In each case the value returned is that of the last
simple command executed.
18.15 Command grouping
Commands may be grouped in two ways,
161
{ command-list ; }
and
( command-list )
In the first command-list is simply executed. The second form executes command-list
as a separate process. For example,
(cd x; rm junk )
executes rm junk in the directory x without changing the current directory of the invoking
shell.
The commands
cd x; rm junk
have the same effect but leave the invoking shell in the directory x.
18.16 Debugging shell procedures
The shell provides two tracing mechanisms to help when debugging shell procedures.
The first is invoked within the procedure as
set -v
(v for verbose) and causes lines of the procedure to be printed as they are read. It is useful to
help isolate syntax errors. It may be invoked without modifying the procedure by saying
sh -v proc ...
where proc is the name of the shell procedure. This flag may be used in conjunction with the
-n flag which prevents execution of subsequent commands. (Note that saying set -n at a
terminal ill render the terminal useless until an end- of-file is typed.)
The command
set -x
will produce an execution trace. Following parameter substitution each command is printed
as it is executed. (Try these at the terminal to see what effect they have.) Both flags may be
turned off by saying
set and the current setting of the shell flags is available as $-.
18.17 Other important commands
18.17.1 The man command
The following is the man command which is used to print sections of the UNIX
manual. It is called, for example, as
$ man sh
$ man -t ed
$ man 2 fork
In the first the manual section for sh is printed. Since no section is specified, section 1 is
used. The second example will typeset (-t option) the manual section for ed.
cd /usr/man
: 'colon is the comment command'
162
: 'default is nroff ($N), section 1 ($s)'
N=n s=1
for i
do case $i in
[1-9]*)
s=$i ;;
-t) N=t ;;
-n) N=n ;;
-*) echo unknown flag \'$i\' ;;
*) if test -f man$s/$i.$s
then ${N}roff man0/${N}aa man$s/$i.$s
else : 'look through all manual sections'
found=no
for j in 1 2 3 4 5 6 7 8 9
do if test -f man$j/$i.$j
then man $j $i
found=yes
fi
done
case $found in
no) echo \'$i: manual page not found\'
esac
fi
esac
done
18.17.2 Keyword parameters
Shell variables may be given values by assignment or when a shell procedure is
invoked. An argument to a shell procedure of the form name=value that precedes the
command name causes value to be assigned to name before execution of the procedure
begins. The value of name in the invoking shell is not affected. For example,
user=fred command
will execute command with user set to fred. The -k flag causes arguments of the form
name=value to be interpreted in this way anywhere in the argument list. Such names are
sometimes called keyword parameters. If any arguments remain they are available as
positional parameters $1, $2, ....
The set command may also be used to set positional parameters from within a
procedure. For example,
set - *
will set $1 to the first file name in the current directory, $2 to the next, and so on. Note that
the first argument, -, ensures correct treatment when the first file name begins with a -.
163
18.17.3 Parameter transmission
When a shell procedure is invoked both positional and keyword parameters may be
supplied with the call. Keyword parameters are also made available implicitly to a shell
procedure by ecifying in advance that such parameters are to be exported. For example,
export user box
marks the variables user and box for export. When a shell procedure is invoked copies are
made of all exportable variables for use within the invoked procedure. Modification of such
variables within the procedure does not affect the values in the invoking shell. It is generally
true of a shell procedure that it may not modify the state of its caller without explicit request
on the part of the caller. (Shared file descriptors are an exception to this rule.)
Names whose value is intended to remain constant may be declared readonly. The
form of this command is the same as that of the export command,
readonly name ...
Subsequent attempts to set readonly variables are illegal.
18.17.4 Parameter substitution
If a shell parameter is not set then the null string is substituted for it. For example, if
the variable d is not set
echo $d
or
echo ${d}
will echo nothing. A default string may be given as in
echo ${d-.}
which will echo the value of the variable d if it is set and `.' otherwise. The default string is
evaluated using the usual quoting conventions so that
echo ${d-'*'}
will echo * if the variable d is not set. Similarly
echo ${d-$1}
will echo the value of d if it is set and the value (if any) of $1 otherwise. A variable may be
assigned a default value using the notation
echo ${d=.}
which substitutes the same string as
echo ${d-.}
164
and if d were not previously set then it will be set to the string `.'. (The notation ${...=...} is
not available for positional parameters.)
If there is no sensible default then the notation
echo ${d?message}
will echo the value of the variable d if it has one, otherwise message is printed by the shell
and execution of the shell procedure is abandoned. If message is absent then a standard
message is printed. A shell procedure that requires some parameters to be set might start as
follows.
: ${user?} ${acct?} ${bin?}
...
Colon (:) is a command that is built in to the shell and does nothing once its arguments have
been evaluated. If any of the variables user, acct or bin are not set then the shell will
abandon execution of the procedure.
18.17.5 Command substitution
The standard output from a command can be substituted in a similar way to
parameters. The command pwd prints on its standard output the name of the current directory.
For example, if the current directory is /usr/fred/bin then the command
d=`pwd`
is equivalent to
d=/usr/fred/bin
The entire string between grave accents (`...`) is taken as the command to be executed and is
replaced with the output from the command. The command is written using the usual quoting
conventions except that a ` must be escaped using a \. For example,
ls `echo "$1"`
is equivalent to
ls $1
Command substitution occurs in all contexts where parameter substitution occurs including
here documents) and the treatment of the resulting text is the same in both cases. This
mechanism allows string processing commands to be used within shell procedures. An
example of such a command is basename which removes a specified suffix from a string. For
example,
basename main.c .c
will print the string main. Its use is illustrated by the following fragment from a cc command.
165
case $A in
...
*.c)
B=`basename $A .c`
...
Esac
that sets B to the part of $A with the suffix .c stripped.
Here are some composite examples.
· for i in `ls -t`; do ...
The variable i is set to the names of files in time order, most recent first.
· set `date`; echo $6 $2 $3, $4
will print, e.g., 1977 Nov 1, 23:59:59
18.17.6 Evaluation and quoting
The shell is a macro processor that provides parameter substitution, command
substitution and file name generation for the arguments to commands. This section discusses
the order in which these evaluations occur and the effects of the various quoting mechanisms.
Commands are parsed initially according to the grammar given in appendix A. Before
a command is executed the following substitutions occur.
parameter substitution, e.g. $user
command substitution, e.g. `pwd`
Only one evaluation occurs so that if, for example, the value of the variable X is the
string $y then
echo $X
will echo $y.
18.17.7 Error handling
The treatment of errors detected by the shell depends on the type of error and on
whether the shell is being used interactively. An interactive shell is one whose input and
output are connected to a terminal (as determined by gtty (2)). A shell invoked with the -i flag
is also interactive.
Execution of a command (see also 3.7) may fail for any of the following reasons.


Input - output redirection may fail. For example, if a file does not exist or cannot be
created.
The command itself does not exist or cannot be executed.
166


The command terminates abnormally, for example, with a "bus error" or "memory
fault". See Figure 2 below for a complete list of UNIX signals.
The command terminates normally but returns a non-zero exit status.
In all of these cases the shell will go on to execute the next command. Except for the last
case an error message will be printed by the shell. All remaining errors cause the shell to exit
from a command procedure. An interactive shell will return to read another command from
the terminal. Such errors include the following.



Syntax errors. e.g., if ... then ... done
A signal such as interrupt. The shell waits for the current command, if any, to finish
execution and then either exits or returns to the terminal.
Failure of any of the built-in commands such as cd.
The shell flag -e causes the shell to terminate if any error is detected. The following are some
of the values of the unix signals
1
hangup
2
interrupt
3*
quit
4*
illegal instruction
5*
trace trap
6*
IOT instruction
7*
EMT instruction
8*
floating point exception
9
kill (cannot be caught or ignored)
10*
bus error
11*
segmentation violation
12*
bad argument to system call
13
write on a pipe with no one to read it
14
alarm clock
15
software termination (from kill (1))
167
Those signals marked with an asterisk produce a core dump if not caught. However,
the shell itself ignores quit which is the only external signal that can cause a dump. The
signals in this list of potential interest to shell programs are 1, 2, 3, 14 and 15.
18.17.8 Fault handling
Shell procedures normally terminate when an interrupt is received from the terminal.
The trap command is used if some cleaning up is required, such as removing temporary files.
For example,
trap 'rm /tmp/ps$$; exit' 2
sets a trap for signal 2 (terminal interrupt), and if this signal is received will execute the
commands
rm /tmp/ps$$; exit
exit is another built-in command that terminates execution of a shell procedure. The exit is
required; otherwise, after the trap has been taken, the shell will resume executing the
procedure at the place where it was interrupted.
UNIX signals can be handled in one of three ways. They can be ignored, in which
case the signal is never sent to the process. They can be caught, in which case the process
must decide what action to take when the signal is received. Lastly, they can be left to cause
termination of the process without it having to take any further action. If a signal is being
ignored on entry to the shell procedure, for example, by invoking it in the background (see
3.7) then trap commands (and the signal) are ignored.
The use of trap is illustrated by this modified version of the touch command. The
cleanup action is to remove the file junk$$. The following is the touch command.
flag=
trap 'rm -f junk$$; exit' 1 2 3 15
for i
do case $i in
-c) flag=N ;;
*) if test -f $i
then ln $i junk$$; rm junk$$
elif test $flag
then echo file \'$i\' does not exist
else >$i
fi
esac
done
The trap command appears before the creation of the temporary file; otherwise it
would be possible for the process to die without removing the file.
168
Since there is no signal 0 in UNIX it is used by the shell to indicate the commands to be
executed on exit from the shell procedure.
A procedure may, itself, elect to ignore signals by specifying the null string as the argument
to trap. The following fragment is taken from the nohup command.
trap '' 1 2 3 15
which causes hangup, interrupt, quit and kill to be ignored both by the procedure and by
invoked commands.
Traps may be reset by saying
trap 2 3
which resets the traps for signals 2 and 3 to their default values. A list of the current values of
traps may be obtained by writing
trap
The procedure scan (given below) is an example of the use of trap where there is no
exit in the trap command. scan takes each directory in the current directory, prompts with its
name, and then executes commands typed at the terminal until an end of file or an interrupt is
received. Interrupts are ignored while executing the requested commands but cause
termination when scan is waiting for input.
d=`pwd`
for i in *
do if test -d $d/$i
then cd $d/$i
while echo "$i:"
trap exit 2
read x
do trap : 2; eval $x; done
fi
done
read x is a built-in command that reads one line from the standard input and places the
result in the variable x. It returns a non-zero exit status if either an end-of-file is read or an
interrupt is received.
18.17.9 Command execution
To run a command (other than a built-in) the shell first creates a new process using
the system call fork. The execution environment for the command includes input, output and
the states of signals, and is established in the child process before the command is executed.
The built-in command exec is used in the rare cases when no fork is required and simply
replaces the shell with a new command. For example, a simple version of the nohup
command looks like
trap \'\' 1 2 3 15
169
exec $*
The trap turns off the signals specified so that they are ignored by subsequently
created commands and exec replaces the shell by the command specified.
Most forms of input output redirection have already been described. In the following
word is only subject to parameter and command substitution. No file name generation or
blank interpretation takes place so that, for example,
echo ... >*.c
will write its output into a file whose name is *.c. Input output specifications are evaluated
left to right as they appear in the command.
> word
The standard output (file descriptor 1) is sent to the file word which is created if it
does not already exist.
>> word
The standard output is sent to file word. If the file exists then output is appended (by
seeking to the end); otherwise the file is created.
< word
The standard input (file descriptor 0) is taken from the file word.
<< word
The standard input is taken from the lines of shell input that follow up to but not
including a line consisting only of word. If word is quoted then no interpretation of
the document occurs. If word is not quoted then parameter and command substitution
occur and \ is used to quote the characters \ $ ` and the first character of word. In the
latter case \newline is ignored (c.f. quoted strings).
>& digit
The file descriptor digit is duplicated using the system call dup (2) and the result is
used as the standard output.
<& digit
The standard input is duplicated from file descriptor digit.
<&The standard input is closed.
>&The standard output is closed.
170
Any of the above may be preceded by a digit in which case the file descriptor created
is that specified by the digit instead of the default 0 or 1. For example,
... 2>file
runs a command with message output (file descriptor 2) directed to file.
... 2<&1
runs a command with its standard output and message output merged. (Strictly speaking file
descriptor 2 is created by duplicating file descriptor 1 but the effect is usually to merge the
two streams.)
The environment for a command run in the background such as
list *.c | lpr &
is modified in two ways. Firstly, the default standard input for such a command is the empty
file /dev/null. This prevents two processes (the shell and the command), which are running in
parallel, from trying to read the same input. Chaos would ensue if this were not the case. For
example,
ed file &
would allow both the editor and the shell to read from the same input at the same time.
The other modification to the environment of a background command is to turn off
the QUIT and INTERRUPT signals so that they are ignored by the command. This allows
these signals to be used at the terminal without causing background commands to terminate.
For this reason the UNIX convention for a signal is that if it is set to 1 (ignored) then it is
never changed even for a short time. Note that the shell command trap has no effect for an
ignored signal.
18.17.10 Invoking the shell
The following flags are interpreted by the shell when it is invoked. If the first
character of argument zero is a minus, then commands are read from the file .profile.
-c string
If the -c flag is present then commands are read from string.
-s
If the -s flag is present or if no arguments remain then commands are read from the
standard input. Shell output is written to file descriptor 2.
-i
If the -i flag is present or if the shell input and output are attached to a terminal (as
told by gtty) then this shell is interactive. In this case TERMINATE is ignored (so that
171
kill 0 does not kill an interactive shell) and INTERRUPT is caught and ignored (so
that wait is interruptable). In all cases QUIT is ignored by the shell.
18.18
Let us sum up
In this lesson we have learnt about
c) the Kernel
d) and shell commands
18.19
Points for Discussion
a) Discuss about the importance of Unix
b) Differentiate between kernel and shell
18.20 Model Answers to Check your Progress
In order to check your progress, try to answer the following
o
What is a shell variable
o
What is the use of MAN command
o
Describe about various control flows in UNIX shell
18.21 Lesson - end activities
After learning this chapter, try to discuss among your friends and answer these
questions to check your progress.
d) Discuss about Kernel
e) Discuss about any 10 shell commands
18.22 References
 H.M. Deitel, Chapter 18 of “Operating Systems”, Second Edition, Pearson
Education, 2001
 Andrew S. Tanenbaum, Chapter 7 of “Modern Operating Systems”, PHI, 1996
172
LESSON – 19: PROCESS, MEMORY AND I/O OF UNIX
CONTENTS
19.1 Aims and Objectives
19.2 Introduction to process management
19.2.1
ps
19.2.2
Runaway Processes
19.2.3
Killing Processes
19.2.4
Top
19.2.5
nice and renice
19.2.6
Job Control
19.3 Memory Management
19.3.1
Kinds of Memory
19.3.2
OS Memory Uses
19.3.3
Process Memory Uses
19.4 The Input/output system
19.4.1
Descriptors and I/O
19.4.2
Descriptor Management
19.4.3
Devices
19.4.4
Socket IPC
19.4.5
Scatter/Gather I/O
19.5 Let us sum up
19.6 Lesson - end activities
19.7 Points for Discussion
19.8 Model Answers to Check your Progress
19.9 References
19.1 Aims and Objectives
In this lesson we will learn about the introduction to process, memory and i/o of UNIX.
The objectives of this lesson is to make the candidate aware of the following
o) Process concept
p) Memory concept
q) And Input/output of Unix
173
19.2 Introduction to Process Management
Any time you use a command on a Unix system, that command runs as a process on
the system. Once the command runs to completion, or is otherwise interrupted, the process
should terminate. Most commands do this flawlessly, however sometimes odd things will
happen. This document will attempt to help you understand how processes work and how to
manage your own processes.
19.2.1 ps
The first topic to cover is how to get information on processes. There several ways to
do this and this is not meant to be an exhaustive listing of methods to use. To find out which
processes are running under your username at this moment, use the command ps -u
username replacing the word "username" with your own username.
The output from ps shows four columns by default: PID, TTY, TIME, and CMD. PID
is short for process ID, a number which uniquely identifies the process on the system.
Whenever a process is started, it is given a PID. You can use this PID to interact with the
process at a later time. The TTY field shows the terminal associated with the process. You do
not generally need to worry about the value of this field. The TIME field displays the total
amount of CPU time the process has used, not the time since the process was started. This
will typically be quite small, unless your process is very CPU intensive. The final field is
CMD, which shows the name of the command associated with that process.
If you are using Borg or Locutus, you will notice that the bash process will be
running any time you are logged in. This is the shell that you are currently using to run
commands. So long as bash is set as your default shell, it will be there when you are logged
in. If you get a friend to run this command from their account with your username while you
are not logged in, nothing should show up at all.
Please note that this is not the only way to use ps and for more information you
should type man ps on the command line for a detailed listing of the different options you
can use.
19.2.2 Runaway Processes
So what happens if you are not logged in and there are still processes running? This
can mean one of several things, but more than likely you have set a process to run in the
background by adding a & at the end of the command (more on this later) and did not close
the program or kill the process. Another possibility is that you have a program that that is
runaway. This means that the process should have completed properly and you have taken all
the correct steps, but for any number of reasons, it did not terminate properly.
174
19.2.3 Killing Processes
In order to kill a process that has run away or that you have otherwise lost control of,
you should use the kill command. There are two popular ways of using kill: kill pid where
"pid" is the process ID number listed under the PID column when you use ps. This is a
graceful way to kill your process by having the operating system send the process a signal
that tells it to stop what it's doing and terminate. Sometimes this doesn't work and you have to
use a more aggressive version of kill. kill -9 pid will get the operating system to kill the
process immediately. This can be dangerous if the process has control of a lock mechanisms,
but your process will stop. If you find yourself completely stuck, terminals have frozen upon
you and you just don't know what to do, you can log in again and use kill -9 -1. This
command will kill all of your processes including your current log in. The next time you log
in, everything that was running or was locked up will have terminated. Please note that using
this command as root will bring down a system quick, fast and in a hurry so use this
command only as a user, never as root. This may also result in lost work, so use it only as a
last resort.
19.2.4 Top
Another very useful command is the top command. This will allow you to see the
processes that are using the most CPU time, memory, etc... If you find that one of your
processes is running near the top of the listing offered by top, it means that you are taking up
more CPU time than most people on the system. You will see a column that has %CPU as its
heading. If your process is taking more than 5% of the CPU on Borg or Locutus for any
extended period of time, you should investigate to see what it's doing. It is quite possible that
you are using a program that does require this much CPU time. This is perfectly alright and
will not pose a problem. If, however, you find that the program isn't doing anything or should
already have completed you should consider killing it.
19.2.5 nice and renice
If you are about to run some program that you know will consume a significant
amount of resources on the servers, please send an e-mail to cshelp first to warn our technical
support team. If possible, you should first consider running such a process on workstations
rather than on the server. You can log in and run any process on our Sun workstations just as
well as on our servers. If you must run a job that requires a lot of CPU time, you should nice
the process. The nice and renice commands change the priority at which a process runs. The
nice is used to start a process with a different priority, and renice changes the priority of a
running process. To start a process with nice, run nice -n nicevalue command arguments
where nicevalue is a number between 1 and 19, and command and arguments are the
command you want to run and its arguments. The larger the number given to nice, the lower
the priority command is given. Note that only the root user can increase priority with nice.
If you have already begun to run a process which is using a lot of CPU time, you can
also re-nice that process. renice acts the same way as nice; except it is used on running
processes. To re-nice a process, simply type renice -n nicevalue -p pid. Here, nicevalue is
the new priority, and pid is the process ID of the process you want to re-nice. You can also
renice all your processes, useful if your feeling guilty about the amount of CPU time your
processes are using. To do this, simply use renice -n nicevalue -u username. Note that you
175
can never increase your priority with renice, nor can you use it on processes owned by other
users.
19.2.6 Job Control
Many shells, including the default bash shell, support what is known as job control.
This allows you to run several processes, at the same time, from the same terminal. You can
run processes that may take a long time to complete in the background, freeing the terminal
for other use. You can suspend running processes, move them to the background, put
background processes back into the foreground, and list all your currently running jobs.
Running processes in the background is important for processes that take a long time
to complete, but don't require any user interaction from the shell. If you start a process such
as netscape from the command line, it is a good idea to place it in the background so the shell
is not tied up. All the output from the process is still visible, unless it is specifically disabled.
To run a process in the background, simply add a & character to the end of the command
line.
The command line prompt has returned immediately, and it has provided you with
some important information regarding the background process. The number inside the square
brackets (here it is 1) indicates the job number of the background process. This number is
used by other job control commands to identify this process. The second number (here it is
5044) is the process ID of the process. The difference between the two numbers is the job
number is unique to the shell, while the process ID is unique to the operating system.
If you run a program normally, but then decide you would like to run it in the
background, you must first suspend the process by hitting Ctrl-Z. This suspends execution of
the process, and gives you a command line. To move the suspended process to the
background, type bg. The process then resumes execution, exactly as if you had typed &
when starting the process. To later bring a background process back into the foreground, use
the fg command. This command takes an argument specifying which background process to
place in the foreground. This is the job number displayed when the process was placed in the
background. If you cannot remember the job number of the process you wish to bring back to
the foreground, the jobs command will list all the jobs you are currently running and their
numbers. This is similar to a small version of ps, but is not quite the same: the jobs command
only lists processes started from that particular shell.
19.3 Memory Management
Unlike traditional PC operating systems, Unix related systems use very sophisticated
memory management algorithms to make efficient use of memory resources. This makes the
questions "How much memory do I have?" and "How much memory is being used?" rather
complicated to answer. First you must realize that there are three different kinds of memory,
three different ways they can be used by the operating system, and three different ways they
can be used by processes.
19.3.1 Kinds of Memory
176
Main - The physical Random Access Memory located on the CPU motherboard that most
people think of when they talk about RAM. Also called Real Memory. This does not
include processor caches, video memory, or other peripheral memory.
File System - Disk memory accessible via pathnames. This does not include raw devices,
tape drives, swap space, or other storage not addressable via normal pathnames. It does
include all network file systems.
Swap Space - Disk memory used to hold data that is not in Real or File System memory.
Swap space is most efficient when it is on a separate disk or partition, but sometimes it is just
a large file in the File System.
19.3.2 OS Memory Uses
Kernel - The Operating System's own (semi-)private memory space. This is always in Main
memory.
Cache - Main memory that is used to hold elements of the File System and other I/O
operations. Not to be confused with the CPU cache or or disk drive cache, which are not part
of main memory.
Virtual - The total addressable memory space of all processes running on the given machine.
The physical location of such data may be spread among any of the three kinds of memory.
19.3.3 Process Memory Uses
Data - Memory allocated and used by the program (usually via malloc, new, or similar
runtime calls).
Stack - The program's execution stack (managed by the OS).
Mapped - File contents addressable within the process memory space.
The amount of memory available for processes is at least the size of Swap, minus
Kernel. On more modern systems (since around 1994) it is at least Main plus Swap minus
Kernel and may also include any files via mapping.
19.4 The Input/output system
The basic model of the UNIX I/O system is a sequence of bytes that can be accessed
either randomly or sequentially. There are no access methods and no control blocks in a
typical UNIX user process.
Different programs expect various levels of structure, but the kernel does not impose
structure on I/O. For instance, the convention for text files is lines of ASCII characters
separated by a single newline character (the ASCII line-feed character), but the kernel knows
nothing about this convention. For the purposes of most programs, the model is further
simplified to being a stream of data bytes, or an I/O stream. It is this single common data
form that makes the characteristic UNIX tool-based approach work Kernighan & Pike, 1984.
An I/O stream from one program can be fed as input to almost any other program. (This kind
of traditional UNIX I/O stream should not be confused with the Eighth Edition stream I/O
system or with the System V, Release 3 STREAMS, both of which can be accessed as
traditional I/O streams.)
177
19.4.1 Descriptors and I/O
UNIX processes use descriptors to reference I/O streams. Descriptors are small
unsigned integers obtained from the open and socket system calls. The open system call takes
as arguments the name of a file and a permission mode to specify whether the file should be
open for reading or for writing, or for both. This system call also can be used to create a new,
empty file. A read or write system call can be applied to a descriptor to transfer data. The
close system call can be used to deallocate any descriptor.
Descriptors represent underlying objects supported by the kernel, and are created by
system calls specific to the type of object. In 4.4BSD, three kinds of objects can be
represented by descriptors: files, pipes, and sockets.
 A file is a linear array of bytes with at least one name. A file exists until all its names
are deleted explicitly and no process holds a descriptor for it. A process acquires a
descriptor for a file by opening that file's name with the open system call. I/O devices
are accessed as files.
 A pipe is a linear array of bytes, as is a file, but it is used solely as an I/O stream, and
it is unidirectional. It also has no name, and thus cannot be opened with open. Instead,
it is created by the pipe system call, which returns two descriptors, one of which
accepts input that is sent to the other descriptor reliably, without duplication, and in
order. The system also supports a named pipe or FIFO. A FIFO has properties
identical to a pipe, except that it appears in the filesystem; thus, it can be opened using
the open system call. Two processes that wish to communicate each open the FIFO:
One opens it for reading, the other for writing.
 A socket is a transient object that is used for interprocess communication; it exists
only as long as some process holds a descriptor referring to it. A socket is created by
the socket system call, which returns a descriptor for it. There are different kinds of
sockets that support various communication semantics, such as reliable delivery of
data, preservation of message ordering, and preservation of message boundaries.
In systems before 4.2BSD, pipes were implemented using the filesystem; when
sockets were introduced in 4.2BSD, pipes were reimplemented as sockets.
The kernel keeps for each process a descriptor table, which is a table that the kernel
uses to translate the external representation of a descriptor into an internal representation.
(The descriptor is merely an index into this table.) The descriptor table of a process is
inherited from that process's parent, and thus access to the objects to which the descriptors
refer also is inherited. The main ways that a process can obtain a descriptor are by opening or
creation of an object, and by inheritance from the parent process. In addition, socket IPC
allows passing of descriptors in messages between unrelated processes on the same machine.
Every valid descriptor has an associated file offset in bytes from the beginning of the
object. Read and write operations start at this offset, which is updated after each data transfer.
For objects that permit random access, the file offset also may be set with the lseek system
call. Ordinary files permit random access, and some devices do, as well. Pipes and sockets do
not.
When a process terminates, the kernel reclaims all the descriptors that were in use by
that process. If the process was holding the final reference to an object, the object's manager
178
is notified so that it can do any necessary cleanup actions, such as final deletion of a file or
deallocation of a socket.
19.4.2 Descriptor Management
Most processes expect three descriptors to be open already when they start running.
These descriptors are 0, 1, 2, more commonly known as standard input, standard output, and
standard error, respectively. Usually, all three are associated with the user's terminal by the
login process (see Section 14.6) and are inherited through fork and exec by processes run by
the user. Thus, a program can read what the user types by reading standard input, and the
program can send output to the user's screen by writing to standard output. The standard error
descriptor also is open for writing and is used for error output, whereas standard output is
used for ordinary output.
These (and other) descriptors can be mapped to objects other than the terminal; such
mapping is called I/O redirection, and all the standard shells permit users to do it. The shell
can direct the output of a program to a file by closing descriptor 1 (standard output) and
opening the desired output file to produce a new descriptor 1. It can similarly redirect
standard input to come from a file by closing descriptor 0 and opening the file.
Pipes allow the output of one program to be input to another program without
rewriting or even relinking of either program. Instead of descriptor 1 (standard output) of the
source program being set up to write to the terminal, it is set up to be the input descriptor of a
pipe. Similarly, descriptor 0 (standard input) of the sink program is set up to reference the
output of the pipe, instead of the terminal keyboard. The resulting set of two processes and
the connecting pipe is known as a pipeline. Pipelines can be arbitrarily long series of
processes connected by pipes.
The open, pipe, and socket system calls produce new descriptors with the lowest
unused number usable for a descriptor. For pipelines to work, some mechanism must be
provided to map such descriptors into 0 and 1. The dup system call creates a copy of a
descriptor that points to the same file-table entry. The new descriptor is also the lowest
unused one, but if the desired descriptor is closed first, dup can be used to do the desired
mapping. Care is required, however: If descriptor 1 is desired, and descriptor 0 happens also
to have been closed, descriptor 0 will be the result. To avoid this problem, the system
provides the dup2 system call; it is like dup, but it takes an additional argument specifying
the number of the desired descriptor (if the desired descriptor was already open, dup2 closes
it before reusing it).
19.4.3 Devices
Hardware devices have filenames, and may be accessed by the user via the same
system calls used for regular files. The kernel can distinguish a device special file or special
file, and can determine to what device it refers, but most processes do not need to make this
determination. Terminals, printers, and tape drives are all accessed as though they were
streams of bytes, like 4.4BSD disk files. Thus, device dependencies and peculiarities are kept
in the kernel as much as possible, and even in the kernel most of them are segregated in the
device drivers.
179
Hardware devices can be categorized as either structured or unstructured; they are
known as block or character devices, respectively. Processes typically access devices through
special files in the filesystem. I/O operations to these files are handled by kernel-resident
software modules termed device drivers. Most network-communication hardware devices are
accessible through only the interprocess-communication facilities, and do not have special
files in the filesystem name space, because the raw-socket interface provides a more natural
interface than does a special file.
Structured or block devices are typified by disks and magnetic tapes, and include most
random-access devices. The kernel supports read-modify-write-type buffering actions on
block-oriented structured devices to allow the latter to be read and written in a totally random
byte-addressed fashion, like regular files. Filesystems are created on block devices.
Unstructured devices are those devices that do not support a block structure. Familiar
unstructured devices are communication lines, raster plotters, and unbuffered magnetic tapes
and disks. Unstructured devices typically support large block I/O transfers.
Unstructured files are called character devices because the first of these to be
implemented were terminal device drivers. The kernel interface to the driver for these devices
proved convenient for other devices that were not block structured.
Device special files are created by the mknod system call. There is an additional
system call, ioctl, for manipulating the underlying device parameters of special files. The
operations that can be done differ for each device. This system call allows the special
characteristics of devices to be accessed, rather than overloading the semantics of other
system calls. For example, there is an ioctl on a tape drive to write an end-of-tape mark,
instead of there being a special or modified version of write.
19.4.4 Socket IPC
The 4.2BSD kernel introduced an IPC mechanism more flexible than pipes, based on
sockets. A socket is an endpoint of communication referred to by a descriptor, just like a file
or a pipe. Two processes can each create a socket, and then connect those two endpoints to
produce a reliable byte stream. Once connected, the descriptors for the sockets can be read or
written by processes, just as the latter would do with a pipe. The transparency of sockets
allows the kernel to redirect the output of one process to the input of another process residing
on another machine. A major difference between pipes and sockets is that pipes require a
common parent process to set up the communications channel. A connection between sockets
can be set up by two unrelated processes, possibly residing on different machines.
System V provides local interprocess communication through FIFOs (also known as
named pipes). FIFOs appear as an object in the filesystem that unrelated processes can open
and send data through in the same way as they would communicate through a pipe. Thus,
FIFOs do not require a common parent to set them up; they can be connected after a pair of
processes are up and running. Unlike sockets, FIFOs can be used on only a local machine;
they cannot be used to communicate between processes on different machines. FIFOs are
implemented in 4.4BSD only because they are required by the POSIX.1 standard. Their
functionality is a subset of the socket interface.
The socket mechanism requires extensions to the traditional UNIX I/O system calls to
provide the associated naming and connection semantics. Rather than overloading the
180
existing interface, the developers used the existing interfaces to the extent that the latter
worked without being changed, and designed new interfaces to handle the added semantics.
The read and write system calls were used for byte-stream type connections, but six new
system calls were added to allow sending and receiving addressed messages such as network
datagrams. The system calls for writing messages include send, sendto, and sendmsg. The
system calls for reading messages include recv, recvfrom, and recvmsg. In retrospect, the first
two in each class are special cases of the others; recvfrom and sendto probably should have
been added as library interfaces to recvmsg and sendmsg, respectively.
19.4.5 Scatter/Gather I/O
In addition to the traditional read and write system calls, 4.2BSD introduced the
ability to do scatter/gather I/O. Scatter input uses the readv system call to allow a single read
to be placed in several different buffers. Conversely, the writev system call allows several
different buffers to be written in a single atomic write. Instead of passing a single buffer and
length parameter, as is done with read and write, the process passes in a pointer to an array of
buffers and lengths, along with a count describing the size of the array.
This facility allows buffers in different parts of a process address space to be written
atomically, without the need to copy them to a single contiguous buffer. Atomic writes are
necessary in the case where the underlying abstraction is record based, such as tape drives
that output a tape block on each write request. It is also convenient to be able to read a single
request into several different buffers (such as a record header into one place and the data into
another). Although an application can simulate the ability to scatter data by reading the data
into a large buffer and then copying the pieces to their intended destinations, the cost of
memory-to-memory copying in such cases often would more than double the running time of
the affected application.
Just as send and recv could have been implemented as library interfaces to sendto and
recvfrom, it also would have been possible to simulate read with readv and write with writev.
However, read and write are used so much more frequently that the added cost of simulating
them would not have been worthwhile.
181
19.5 Let us sum up
In this lesson we have learnt about
a) the UNIX process management
b) the Memory Management in UNIX
c) and input/output in UNIX
19.6 Points for Discussion
Discuss about Input/output in UNIX
19.7 Model Answers to Check your Progress
To check your progress try to answer the following questions
a) Socket IPC
b) Descriptor management
19.8 Lesson - end activities
After learning this chapter, try to discuss among your friends and answer these questions to
check your progress.
b) Discuss about process and memory management in UNIX
c) Discuss about Memory Management in UNIX
19.9 References
 H.M. Deitel, Chapter 18 of “Operating Systems”, Second Edition, Pearson Education,
2001
 Andrew S. Tanenbaum, Chapter 7 of “Modern Operating Systems”, PHI, 1996
182
Download