notes

advertisement
CS 346 – Chapter 1
•
•
•
•
Operating system – definition
Responsibilities
What we find in computer systems
Review of
– Instruction execution
– Compile – link – load – execute
• Kernel versus user mode
Questions 
• What is the purpose of a computer?
• What if all computers became fried or infected?
• How did Furman function before 1967 (the year
we bought our first computer)?
• Why do people not like computers?
Definition
• How do you define something? Possible approaches:
–
–
–
–
What it consists of
What is does (a functional definition) – purpose
What if we didn’t have it
What else it’s similar to
• OS = set of software between user and HW
–
–
–
–
Provides “environment” for user to work
Convenience and efficiency
Manage the HW / resources
Ensure correct and appropriate operation of machine
• 2 Kinds of software: application and system
– Distinction is blurry; no universal definition for “system”
Some responsibilities
• Can glean from table of contents 
– Book compares an OS to a government
– Don’t worry about details for now
• Security: logins
• Manage resources
– Correct and efficient use of CPU
– Disk: “memory management”
– network access
• File management
• I/O, terminal, devices
• Kernel vs. shell
Big picture
• Computer system has: CPU, main memory, disk, I/O
devices
• Turn on computer:
– Bootstrap program already in ROM comes to life
– Tells where to find the OS on disk. Load the OS.
– Transfer control to OS once loaded.
• From time to time, control is “interrupted”
– Examples?
• Memory hierarchy
– Several levels of memory in use from registers to tape
– Closer to CPU: smaller, faster, more expensive
– OS must decide who belongs where
Big picture (2)
• von Neumann program execution
– Fetch, decode, execute, data access, write result
– OS usually not involved unless problem
• Compiling
–
–
–
–
–
1 source file  1 object file
1 entire program  1 executable file
“link” object files to produce executable
Code may be optimized to please the OS
When you invoke a program, OS calls a “loader” program that
precedes execution
• I/O
– Each device has a controller, a circuit containing registers and a
memory buffer
– Each controller is managed by a device driver (software)
2 modes
• When CPU executing instructions, nice to know if the
instruction is on behalf of the OS
• OS should have the highest privileges  kernel mode
– Some operations only available to OS
– Examples?
• Users should have some restriction  user mode
• A hardware bit can be set if program is running in kernel
mode
• Sometimes, the user needs OS to help out, so we
perform a system call
Management topics
• What did we ask the OS to do during lab?
• File system
• Program vs. process
–
–
–
–
“job” and “task” are synonyms of process
Starting, destroying processes
Process communication
Make sure 2 processes don’t interfere with each other
• Multiprogramming
– CPU should never be idle
– Multitasking: give each job a short quantum of time to take turns
– If a job needs I/O, give CPU to another job
More topics
• Scheduling: deciding the order to do the jobs
– Detect system “load”
– In a real-time system, jobs have deadlines. OS should know
worst-case execution time of jobs
• Memory hierarchy
– Higher levels “bank” the lower levels
– OS manages RAM/disk decision
– Virtual memory: actual size of RAM is invisible to user. Allow
programmer to think memory is huge
– Allocate and deallocate heap objects
– Schedule disk ops and backups of data
CS 346 – Chapter 2
• OS services
– OS user interface
– System calls
– System programs
• How to make an OS
– Implementation
– Structure
– Virtual machines
• Commitment
– For next day, please finish chapter 2.
OS services
2 types
• For the user’s convenience
–
–
–
–
–
Shell
Running user programs
Doing I/O
File system
Detecting problems 
• Internal/support
– Allocating resources
– System security
– Accounting
• Infamous KGB spy ring uncovered due to discrepancy in
billing of computer time at Berkeley lab
User interface
• Command line = shell program
– Parses commands from user
– Supports redirection of I/O (stdin, stdout, stderr)
• GUI
– Pioneered by Xerox PARC, made famous by Mac
– Utilizes additional input devices such as mouse
– Icons or hotspots on screen
• Hybrid approach
– GUI allowing several terminal windows
– Window manager
System calls
• “an interface for accessing an OS service within a
computer program”
• A little lower level than an API, but similar
• Looks like a function call
• Examples
– Performing any I/O request, because these are not defined by
the programming language itself
e.g. read(file_ptr, str_buf_ptr, 80);
– assembly languages typically have “syscall” instruction.
When is it used?
How?
• If many parameters, they may be put on runtime stack
Types of system calls
• Controlling a process
• File management
• Device management
• Information
• Communication between processes
• What are some specific examples you’d expect to find?
System programs
• Also called system utilities
• Distinction between “system call” and “system program”
• Examples
–
–
–
–
–
Shell commands like ls, lp, ps, top
Text editors, compilers
Communication: e-mail, talk, ftp
Miscellaneous: cal, fortune
What are your favorites?
• Higher level software includes:
– Spreadsheets, text formatters, etc.
– But, boundary between “application” and “utility” software is
blurry. A text formatter is a type of compiler!
OS design ideas
• An OS is a big program, so we should consider
principles of systems analysis and software engineering
• In design phase, need to consider policies and
mechanisms
– Policy = What should we do; should we do X
– Mechanism = how to do X
– Example:  a way to schedule jobs (policy)
versus: what input needed to produce schedule, how schedule
decision is specified (mechanism)
Implementation
• Originally in assembly
• Now usually in C (C++ if object-oriented)
• Still, some code needs to be in assembly
– Some specific device driver routines
– Saving/restoring registers
• We’d like to use HLL as much as possible – why?
• Today’s compilers produce very efficient code – what
does this tell us?
• How to improve performance of OS:
– More efficient data structure, algorithm
– Exploit HW and memory hierarchy
– Pay attention to CPU scheduling and memory management
Kernel structure
• Possible to implement minimal OS with a few thousand
lines of code  monolithic kernel
– Modularize like any other large program
– After about 10k loc, difficult to prove correctness
• Layered approach to managing the complexity
– Layer 0 is the HW
– Layer n is the user interface
– Each layer makes use of routines and d.s. defined at lower
levels
– # layers difficult to predict: many subtle dependencies
– Many layers  lots of internal system call overhead 
Kernel structure (2)
• kernel
– Kernel = minimal support for processes and memory
management
– (The rest of the OS is at user level)
– Adding OS services doesn’t require changing kernel, so easier to
modify OS
– The kernel must manage communication between user program
and appropriate OS services (e.g. file system)
– Microsoft gave up on kernel idea for Windows XP
• OO Module approach
– Components isolated (OO information hiding)
– Used by Linux, Solaris
– Like a layered approach with just 2 layers, a core and everything
else
Virtual machine
• How to make 1 machine behave like many
• Give users the illusion they have access to real HW,
distinct from other users
• Figure 2.17 levels of abstraction:
– Processes / kernels / VM’s / VM implementations / host HW
As opposed to:
– Processes / kernels / different machines
• Why do it?
– To test multiple OS’s on the same HW platform
– Host machine’s real HW protected from virus in a VM bubble
VM implementation
• It’s hard!
– Need to painstakingly replicate every HW detail, to avoid giving
away the illusion
– Need to keep track of what each guest OS is doing (whether it’s
in kernel or user mode)
– Each VM must interpret its assembly code – why? Is this a
problem?
• Very similar concept: simulation
– Often, all we are interested in is changing the HW, not the OS;
for example, adding/eliminating the data cache
– Write a program that simulates every HW feature, providing the
OS with the expected behavior
CS 346 – Chapter 3
•
•
•
•
•
•
What is a process
Scheduling and life cycle
Creation
Termination
Interprocess communication: purpose, how to do it
Client-server: sockets, remote procedure call
• Commitment
– Please read through section 3.4 by Wednesday and
3.6 by Friday.
Process
• Goal: to be able to run > 1 program concurrently
– We don’t have to finish one before starting another
– Concurrent doesn’t mean parallel
– CPU often switches from one job to another
• Process = a program that has started but hasn’t yet
finished
• States:
– New, Ready, Running, Waiting, Terminated
– What transitions exist between these states?
Contents
• A process consists of:
–
–
–
–
–
Code (“text” section)
Program Counter
Data section
Run-time stack
Heap allocated memory
• A process is represented in
kernel by a Process Control
Block, containing:
–
–
–
–
–
–
–
State
Program counter
Register values
Scheduling info (e.g. priority)
Memory info (e.g. bounds)
Accounting (e.g. time)
I/O info (e.g. which files open)
– What is not stored here?
Scheduling
• Typically many processes are ready, but only 1 can run
at a time.
– Need to choose who’s next from ready queue
– Can’t stay running for too long!
– At some point, process needs to be switched out temporarily
back to the ready queue (Fig. 3.4)
• What happens to a process ? (Fig 3.7)
– New process enters ready queue. At some point it can run.
– After running awhile, a few possibilities:
1. Time quantum expires. Go back to ready queue.
2. Need I/O. Go to I/O queue, do I/O, re-enter ready queue!
3. Interrupted. Handle interrupt, and go to ready queue.
– Context switch overhead 
Creation
• Processes can spawn other processes.
– Parent / child relationship
– Tree
– Book shows Solaris example:
In the beginning, there was sched, which spawned init (the
ancestor of all user processes), the memory manager, and the
file manager.
– Process ID’s are unique integers (up to some max e.g. 215)
• What should happen when process created?
– OS policy on what resources for baby: system default, or copy
parent’s capabilities, or specify at its creation
– What program does child run? Same as parent, or new one?
– Does parent continue to execute, or does it wait (i.e. block)?
How to create
• Unix procedure is typical…
• Parent calls fork( )
– This creates duplicate process.
– fork( ) returns 0 for child; positive number for parent; negative
number if error. (How could we have error?)
• Next, we call exec( ) to tell child what program to run.
– Do this immediately after fork
– Do inside the if clause that corresponds to case that we are
inside the child!
• Parent can call wait( ) to go to sleep.
– Not executing, not in ready queue
Termination
• Assembly programs end with a system call to exit( ).
– An int value is returned to parent’s wait( ) function. This lets
parent know which child has just finished.
• Or, process can be killed prematurely
– Why?
– Only the parent (or ancestor) can kill another process – why this
restriction?
• When a process dies, 2 possible policies:
– OS can kill all descendants (rare)
– Allow descendants to continue, but set parent of dead process to
init
IPC Examples
• Allowing concurrent access to information 
– Producer / consumer is a common paradigm
• Distributing work, as long as spare resources (e.g. CPU)
are around
• A program may need result of another program
– IPC more efficient than running serially and redirecting I/O
– A compiler may need result of timing analysis in order to know
which optimizations to perform
• Note: ease of programming is based on what OS and
programming language allow
2 techniques
• Shared memory
– 2 processes have access to an overlapping area of memory
– Conceptually easier to learn, but be careful!
– OS overhead only at the beginning: get kernel permission to set
up shared region
• Message passing
– Uses system calls, with kernel as middle man – easier to code
correctly
– System call overhead for every message  we’d want amount
of data to be small
– Definitely better when processes on different machines
• Often, both approaches are possible on the system
Shared memory
• Usually forbidden to touch another process’ memory
area
• Each program must be written so that the shared
memory request is explicit (via system call)
– An overlapping “buffer” can be set up. Range of addresses. But
there is no need for the buffer to be contiguous in memory with
the existing processes.
– Then, the buffer can be treated like an array (of char)
• Making use of the buffer (p. 122)
– Insert( ) function
– Remove( ) function
– Circular array… does the code make sense to you?
Shared memory (2)
• What could go wrong?... How to fix?
• Trying to insert into full buffer
• Trying to remove from empty buffer
• Sound familiar?
• Also: both trying to insert. Is this a problem?
Message passing
• Make continual use of system calls:
– Send( )
– Receive( )
• Direct or indirect communication?
– Direct: send (process_num, the_message)
Hard coding the process we’re talking to
– Indirect: send (mailbox_num, the_message)
Assuming we’ve set up a “mailbox” inside the kernel
• Flexibility: can have a communication link with more
than 2 processes. e.g. 2 producers and 1 consumer
• Design issues in case we have multiple consumers
– We could forbid it
– Could be first-come-first-serve
Synchronization
• What should we do when we send/receive a message?
• Block (or “wait”):
– Go to sleep until counterpart acts.
– If you send, sleep until received by process or mailbox.
– If you receive, block until a message available. How do we
know?
• Don’t block
– Just keep executing. If they drop the baton it’s their fault.
– In case of receive( ), return null if there is no message (where do
we look?)
• We may need some queue of messages (set up in
kernel) so we don’t lose messages!
Buffer messages
• The message passing may be direct (to another specific
process) or indirect (to a mailbox – no process explicitly
stated in the call).
• But either way, we don’t want to lose messages.
• Zero capacity: sender blocks until recipient gets
message
• Bounded capacity (common choice): Sender blocks if
the buffer is full.
• Unbounded capacity: Assume buffer is infinite. Never
block when you send.
Socket
• Can be used as an “endpoint of communication”
• Attach to a (software) port on a “host” computer
connected to the Internet
– 156.143.143.132:1625 means port # 1625 on the machine
whose IP number is 156.143.143.132
– Port numbers < 1024 are pre-assigned for “well known” tasks.
For example, port 80 is for a Web server.
• With a pair of sockets, you can communicate between
them.
• Generally used for remote I/O
Implementation
• Syntax depends on language.
• Server
–
–
–
–
–
–
Create socket object on some local port.
Wait for client to call. Accept connection.
Set up output stream for client.
Write data to client.
Close client connection.
Go back to wait
• Client
– Create socket object to connect to server
– Read input analogous to file input or stdin
– Close connection to server
Remote procedure call
• Useful application of inter-process communication (the
message-passing version)
• Systematic way to make procedure call between processes
on the network
– Reduce implementation details for user
• Client wants to call foreign function with some parameters
–
–
–
–
–
Tell kernel server’s IP number and function name
1st message: ask server which port corresponds with function
2nd message: sending function call with “marshalled” parameters
Server daemon listens for function call request, and processes
Client receives return value
• OS should ensure function call successful (once)
CS 346 – Chapter 4
• Threads
– How they differ from processes
– Definition, purpose
Threads of the same process share: code, data, open files
– Types
– Support by kernel and programming language
– Issues such as signals
– User thread implementation: C and Java
• Commitment
– For next day, please read chapter 4
Thread intro
• Also called “lightweight process”
• One process may have multiple threads of execution
• Allows a process to do 2+ things concurrently 
– Games
– Simulations
• Even better: if you have 2+ CPU’s, you can execute in
parallel
• Multicore architecture  demand for multithreaded
applications for speedup
• More efficient than using several concurrent processes
Threads
• A process contains:
– Code, data, open files, registers, memory usage (stack + heap),
program counter
• Threads of the same process share
– Code, data, open files
• What is unique to each thread?
• Can you think of example of a computational algorithm
where threads would be a great idea?
– Splitting up the code
– Splitting up the data
• Any disadvantages?
2 types of threads
• User threads
– Can be managed / controlled by user
– Need existing programming language API support:
POSIX threads in C
Java threads
• Kernel threads
– Management done by the kernel
•  Possible scenarios
– OS doesn’t support threading
– OS support threads, but only at kernel level – you have no direct
control, except possibly by system call
– User can create thread objects and manipulate them. These
objects map to “real” kernel threads.
Multithreading models
• Many-to-one: User can create several thread objects,
but in reality the kernel only gives you one.
Multithreading is an illusion
• One-to-one: Each user thread maps to 1 real kernel
thread. Great but costly to OS. There may be a hard
limit to # of live threads.
• Many-to-many: A happy compromise. We have
multithreading, but the number of true threads may be
less than # of thread objects we created.
– A variant of this model “two-level” allows user to designate a
thread as being bound to one kernel thread.
Thread issues
• What should OS do if a thread calls fork( )?
– Can duplicate just the calling thread
– Can duplicate all threads in the process
• exec ( ) is designed to replace entire current process
• Cancellation
– kill thread before it’s finished
– “Asynchronous cancellation” = kill now. But it may be in the
middle of an update, or it may have acquired resources.
You may have noticed that Windows sometimes won’t let you
delete a file because it thinks it’s still open.
– “Deferred cancellation”. Thread periodically checks to see if it’s
time to quit. Graceful exit.
Signals
• Reminiscent of exception in Java
• Occurs when OS needs to send message to a process
– Some defined event generates a signal
– OS delivers signal
– Recipient must handle the signal.
Kernel defines a default handler – e.g. kill the process.
Or, user can write specific handler.
• Types of signals
– Synchronous: something in this program caused the event
– Asynchronous: event was external to my program
Signals (2)
• But what if process has multiple threads? Who gets the
signal? For a given signal, choose among 4 possibilities:
–
–
–
–
Deliver signal to the 1 appropriate thread
Deliver signal to all threads
Have the signal indicate which threads to contact
Designate a thread to receive all signals
• Rules of thumb…
– Synchronous event  just deliver to 1 thread
– User hit ctrl-C  kill all threads
Thread pool
• Like a motor pool
• When process starts, can create a set of threads that sit
around and wait for work
• Motivation
– overhead in creating/destroying
– We can set a bound for total number of threads, and avoid
overloading system later
• How many threads?
– User can specify
– Kernel can base on available resources (memory and # CPU’s)
– Can dynamically change if necessary
POSIX threads
• aka “Pthreads”
• C language
• Commonly seen in UNIX-style environments:
– Mac OS, Linux, Solaris
• POSIX is a set of standards for OS system calls
– Thread support is just one aspect
• POSIX provides an API for thread creation and
synchronization
• API specifies behavior of thread functionality, but not the
low-level implementation
Pthread functions
• pthread_attr_init
–
–
–
–
Initialize thread attributes, such as
Schedule priority
Stack size
State
• pthread_create
– Start new thread inside the process.
– We specify what function to call when thread starts, along with
the necessary parameter
– The thread is due to terminate when its function returns
• pthread_join
– Allows us to wait for a child thread to finish
Example code
#include <pthread.h>
int sum;
main() {
pthread_t tid;
pthread_attr attr;
pthread_attr_init(&attr);
pthread_create(&tid,
&attr, fun, argv[1]);
pthread join(tid, NULL);
printf(“%d\n”, sum);
}
int fun(char *param) ...
void *fun(void *param)
{
// compute a sum:
// store in global
// variable
...
}
Java threads
• Managed by the Java virtual machine
• Two ways to create threads
1. Create a class that extends the Thread class
–
Put code inside public void run( )
2. Implement the Runnable interface
–
public void run( )
• Parent thread (e.g. in main() …)
–
–
Create thread object – just binds name of thread
Call start( ) – creates actual running thread, goes to run( )
See book example
Skeletons
class Worker extends
Thread {
public void run() {
// do stuff
}
}
public class Driver
{
// in main method:
Worker w = new Worker();
w.start();
... Continue/join
}
class Worker2 implements
Runnable {
public void run() {
// do stuff
}
}
Public class Driver2
{
// in main method:
Runnable w2=new Worker2();
Thread t = new Thread(w2);
t.start();
// ...Continue/join
}
Java thread states
• This will probably sound familiar!
• New
– From here, go to “runnable” at call to start( )
• Runnable
– Go to “blocked” if need I/O or going to sleep
– Go to “dead” when we exit run( )
– Go to “waiting” if we call join( ) for child thread
• Blocked
– Go to “runnable” when I/O is serviced
• Waiting
• Dead
CS 346 – Sect. 5.1-5.2
• Process synchronization
– What is the problem?
– Criteria for solution
– Producer / consumer example
– General problems difficult because of subtleties
Problem
• It’s often desirable for processes/threads to share data
– Can be a form of communication
– One may need data being produced by the other
• Concurrent access  possible data inconsistency
• Need to “synchronize”…
– HW or SW techniques to ensure orderly execution
• Bartender & drinker
–
–
–
–
–
Bartender takes empty glass and fills it
Drinker takes full glass and drinks contents
What if drinker overeager and starts drinking too soon?
What if drinker not finished when bartender returns?
Must ensure we don’t spill on counter.
Key concepts
• Critical section = code containing access to shared data
– Looking up a value or modifying it
• Race condition = situation where outcome of code
depends on the order in which processes take turns
– The correctness of the code should not depend on scheduling
• Simple example: producer / consumer code, p. 204
– Producer adds data to buffer and executes ++count;
– Consumer grabs data and executes --count;
– Assume count initially 5.
– Let’s see what could happen…
Machine code
Producer’s ++count becomes:
1
2
3
r1 = count
r1 = r1 + 1
count = r1
Consumer’s --count becomes:
4
5
6
r2 = count
r2 = r2 – 1
count = r2
Does this code work?
Yes, if we execute in order 1,2,3,4,5,6 or 4,5,6,1,2,3 -- see why?
Scheduler may have other ideas!
Alternate schedules
1
2
4
5
3
6
r1 = count
r1 = r1 + 1
r2 = count
r2 = r2 – 1
count = r1
count = r2
1
2
4
5
6
3
r1 = count
r1 = r1 + 1
r2 = count
r2 = r2 – 1
count = r2
count = r1
• What are the final values of count?
• How could these situations happen?
• If the updating of a single variable is nontrivial, you
can imagine how critical the general problem is!
Solution criteria
• How do we know we have solved a synchronization
problem? 3 criteria:
• Mutual exclusion – Only 1 process may be inside its
critical section at any one time.
– Note: For simplicity we’re assuming there is one zone of shared
data, so each process using it has 1 critical section.
• Progress – Don’t hesitate to enter your critical section if
no one else is in theirs.
– Avoid an overly conservative solution
• Bounded waiting – There is a limit on # of times you may
access your critical section if another is still waiting to
enter theirs.
– Avoid starvation
Solution skeleton
while (true)
{
Seek permission to enter critical section
Do critical section
Announce done with critical section
Do non-critical code
}
• BTW, easy solution is to forbid preemption.
– But this power can be abused.
– Identifying critical section  can avoid preemption for a shorter
period of time.
CS 346 – Sect. 5.3-5.7
• Process synchronization
– A useful example is “producer-consumer” problem
– Peterson’s solution
– HW support
– Semaphores
– “Dining philosophers”
• Commitment
– Compile and run semaphore code from os-book.com
Peterson’s solution
… to the 2-process producer/consumer problem. (p. 204)
while (true)
{
ready[ me ] = true
turn = other
while (ready[ other ] && turn == other) ;
Do critical section
ready[ me ] = false
Do non-critical code
}
// Don’t memorize but think: Why does this ensure mutual exclusion?
// What assumptions does this solution make?
HW support
• As we mentioned before, we can disable interrupts
–  No one can preempt me.
– Disadvantages
• The usual way to handle synchronization is by careful
programming (SW)
• We require some atomic HW operations
– A short sequence of assembly instructions guaranteed to be
non-interruptable
– This keeps non-preemption duration to absolute minimum
– Access to “lock” variables visible to all threads
– e.g. swapping the values in 2 variables
– e.g. get and set some value (aka “test and set”)
Semaphore
• Dijkstra’s solution to mutual exclusion problem
• Semaphore object
– integer value attribute ( > 0 means resource is available)
– acquire and release methods
• Semaphore variants: binary and counting
– Binary semaphore aka “mutex” or “mutex lock”
acquire()
{
if (value <= 0)
wait/sleep
--value
}
release()
{
++value
// wake sleeper
}
Deadlock / starvation
• After we solve a mutual exclusion problem, also need to
avoid other problems
– Another way of expressing our synchronization goals
• Deadlock: 2+ process waiting for an event that can only
be performed by one of the waiting processes
– the opposite of progress
• Starvation: being blocked for an indefinite or unbounded
amount of time
– e.g. Potentially stuck on a semaphore wait queue forever
Bounded-buffer problem
• aka “producer-consumer”. See figures 5.9 – 5.10
• Producer class
– run( ) to be executed by a thread
– Periodically call insert( )
• Consumer class
– Also to be run by a thread
– Periodically call remove( )
• BoundedBuffer class
– Creates semaphores (mutex, empty, full): why 3?
Initial values: mutex = 1, empty = SIZE, full = 0
– Implements insert( ) and remove( ).
These methods contain calls to semaphore operations acquire( )
and release( ).
Insert & delete
public void insert(E item)
{
empty.acquire();
mutex.acquire();
public E remove()
{
full.acquire();
mutex.acquire();
// add an item to the
// buffer...
mutex.release();
full.release();
}
// remove item ...
mutex.release();
empty.release();
}
• What are we doing with the semaphores?
Readers/writers problem
• More general than producer-consumer
• We may have multiple readers and writers of shared info
• Mutual exclusion requirement:
Must ensure that writers have exclusive access
• It’s okay to have multiple readers reading
See example solution, Fig. 5.10 – 5.12
• Reader and Writer threads periodically want to execute.
– Operations guarded by semaphore operations
• Database class (analogous to BoundedBuffer earlier)
– readerCount
– 2 semaphores: one to protect database, one to protect the
updating of readerCount
Solution outline
Reader:
mutex.acquire();
++readerCount;
if(readerCount == 1)
db.acquire();
mutex.release();
// READ NOW
mutex.acquire();
--readerCount;
if(readerCount == 0)
db.release();
mutex.release();
Writer:
db.acquire();
// WRITE NOW
db.release();
Example output
writer
writer
writer
reader
writer
reader
reader
Reader
Reader
Reader
writer
Reader
Reader
Reader
writer
reader
writer
0
0
0
2
1
0
1
2
0
1
0
1
2
0
1
0
1
wants to write.
is writing.
is done writing.
wants to read.
wants to write.
wants to read.
wants to read.
is reading. Reader count = 1
is reading. Reader count = 2
is reading. Reader count = 3
wants to write.
is done reading. Reader count = 2
is done reading. Reader count = 1
is done reading. Reader count = 0
is writing.
wants to read.
is done writing.
CS 346 – Sect. 5.7-5.8
• Process synchronization
– “Dining philosophers” (Dijkstra, 1965)
– Monitors
Dining philosophers
• Classic OS problem
– Many possible solutions depending on how foolproof you want
solution to be
• Simulates synchronization situation of several resources,
and several potential consumers.
• What is the problem?
• Model chopsticks with semaphores – available or not.
– Initialize each to be 1
• Achieve mutual exclusion:
– acquire left and right chopsticks (numbered i and i+1)
– Eat
– release left and right chopsticks
• What could go wrong?
DP (2)
• What can we say about this solution?
mutex.acquire();
Acquire 2 neighboring forks
Eat
Release the 2 forks
mutex.release();
• Other improvements:
– Ability to see if either neighbor is eating
– May make more sense to associate semaphore with the
philosophers, not the forks. A philosopher should block if cannot
acquire both forks.
– When done eating, wake up either neighbor if necessary.
Monitor
• Higher level than semaphore
– Semaphore coding can be buggy
• Programming language construct
– Special kind of class / data type
– Hides implementation detail
• Automatically ensures mutual exclusion
– Only 1 thread may be “inside” monitor at any one time
– Attributes of monitor are the shared variables
– Methods in monitor deal with specific synchronization problem.
This is where you access shared variables.
– Constructor can initialize shared variables
• Supported by a number of HLLs
– Concurrent Pascal, Java, C#
Condition variables
• With a monitor, you get mutual exclusion
• If you also want to ensure against deadlock or starvation,
you need condition variables
• Special data type associated with monitors
• Declared with other shared attributes of monitor
• How to use them:
– No attribute value to manipulate. 2 functions only:
– Wait: if you call this, you go to sleep. (Enter a queue)
– Signal: means you release a resource, waking up a thread
waiting for it.
– Each condition variable has its own queue of waiting
threads/processes.
Signal( )
•
•
•
•
A subtle issue for signal…
In a monitor, only 1 thread may be running at a time.
Suppose P calls x.wait( ). It’s now asleep.
Later, Q calls x.signal( ) in order to yield resource to
P.
• What should happen? 3 design alternatives:
– “blocking signal” – Q immediately goes to sleep so that P can
continue.
– “nonblocking signal” – P does not actually resume until Q has left
the monitor
– Compromise – Q immediately exits the monitor.
• Whoever gets to continue running may have to go to
sleep on another condition variable.
CS 346 – Sect. 5.9
• Process synchronization
– “Dining philosophers” monitor solution
– Java synchronization
– atomic operations
Monitor for DP
• Figure 5.18 on page 228
• Shared variable attributes:
– state for each philosopher
– “self” condition variable for each philosopher
• takeForks( )
– Declare myself hungry
– See if I can get the forks. If not, go to sleep.
• returnForks( )
– Why do we call test( )?
• test( )
– If I’m hungry and my neighbors are not eating, then I will eat and
leave the monitor.
Synch in Java
• “thread safe” = data remain consistent even if we have
concurrently running threads
• If waiting for a (semaphore) value to become positive
– Busy waiting loop 
– Better: Java provides Thread.yield( ): “block me”
• But even “yielding” ourselves can cause livelock
– Continually attempting an operation that fails
– e.g. You wait for another process to run, but the scheduler
keeps scheduling you instead because you have higher priority
Synchronized
• Java’s answer to synchronization is the keyword
synchronized – qualifier for method
as in public synchronized void funName(params) { …
• When you call a synchronized method belonging to an
object, you obtain a “lock” on that object
e.g. sem.acquire();
• Lock automatically released when you exit method.
• If you try to call a synchronized method, & the object is
already locked by another thread, you are blocked and
sent to the object’s entry set.
– Not quite a queue. JVM may arbitrarily choose who gets in next
Avoid deadlock
• Producer/consumer example
– Suppose buffer is full. Producer now running.
– Producer calls insert( ). Successfully enters method  has lock
on the buffer. Because buffer full, calls Thread.yield( ) so that
consumer can eat some data.
– Consumer wakes up, but cannot enter remove( ) method
because producer still has lock.  we have deadlock.
• Solution is to use wait( ) and notify( ).
– When you wait, you release the lock, go to sleep (blocked), and
enter the object’s wait set. Not to be confused with entry set.
– When you notify, JVM picks a thread T from the wait set and
moves it to entry set. T now eligible to run, and continues from
point after its call to wait().
notifyAll
• Put every waiting thread into the entry set.
– Good idea if you think  > 1 thread waiting.
– Now, all these threads compete for next use of synchronized
object.
• Sometimes, just calling notify can lead to deadlock
– Book’s doWork example ***
– Threads are numbered
– doWork has a shared variable turn. You can only do work here if
it’s your turn: if turn == your number.
– Thread 3 is doing work, sets turn to 4, and then leaves.
– But thread 4 is not in the wait set. All other threads will go to
sleep.
More Java support
See: java.util.concurrent
• Built-in ReentrantLock class
– Create an object of this class; call its lock and unlock
methods to access your critical section (p. 282)
– Allows you to set priority to waiting threads
• Condition interface (condition variable)
– Meant to be used with a lock. What is the goal?
– await( ) and notify( )
• Semaphore class
– acquire( ) and release( )
Atomic operations
• Behind the scenes, need to make sure instructions are
performed in appropriate order
• “transaction” = 1 single logical function performed by a
thread
– In this case, involving shared memory
– We want it to run atomically
• As we perform individual instructions, things might go
smoothly or not
– If all ok, then commit
– If not, abort and “roll back” to earlier state of computation
• This is easier if we have fewer instructions in a row to do
Keeping the order
Transaction 1
Transaction 2
Transaction 1
Read (A)
Read (A)
Write (A)
Write (A)
Transaction 2
Read (B)
Read (A)
Write (B)
Write (A)
Read (A)
Read (B)
Write (A)
Write (B)
Read (B)
Read (B)
Write (B)
Write (B)
• Are these two schedules equivalent? Why?
CS 346 – Chapter 6
• CPU scheduling
– Characteristics of jobs
– Scheduling criteria / goals
– Scheduling algorithms 
– System load
– Implementation issues
– Real-time scheduling
Schedule issues
• Multi-programming is good!  better CPU utilization
• CPU burst concept
– Jobs typically alternate between work and wait
– Fig. 6.2: Distribution has long tail on right.
General questions
• How or when does a job enter the ready queue?
• How much time can a job use the CPU?
• Do we prioritize jobs?
• Do we pre-empt jobs?
• How do we measure overall performance?
Scheduler
• Makes short-term decisions
– When? Whenever a job changes state (becomes ready, needs
to wait, finishes)
– Selects a job on the ready queue
– Dispatcher can then do the “context switch” to give CPU to new
job
• Should we preempt?
– Non-preemptive = Job continues to execute until it has to wait or
finishes
– Preemptive = Job may be removed from CPU while doing work!
– When you preempt: need to leave CPU “gracefully”. May be in
the middle of a system call or modifying shared data. Often we
let that operation complete.
Scheduling criteria
• CPU utilization = what % of time CPU is executing
instructions
• Throughput = # or rate of jobs completed in some time
period
• Turnaround time = (finish time) – (request time)
• Waiting time = how long spent in ready state
– Confusing name!
• Response time = how long after request that a job
begins to produce output
• Usually, we want to optimize the “average” of each
measure. e.g. Reduce average turnaround time.
Some scheduling algorithms
• First-come, first-served
• Round robin
– Like FCFS, but each job has a limited time quantum
• Shortest job next
• We use a Gantt chart to view and evaluate a schedule
– e.g. compute average turnaround time
• Often, key question is – in what order do we execute
jobs?
• Let’s compare FCFS and SJN…
Example 1
Process number
Time of request
Execution time needed
1
0
20
2
5
30
3
10
40
4
20
10
• First-come, first-served
–
–
–
–
Process 1 can execute from t=0 to t=20
Process 2 can execute from t=20 to t=50
Process 3 can execute from t=50 to t=90
Process 4 can execute from t=90 to t=100
• We can enter this info as extra columns in the table.
• What is the average turnaround time?
• What if we tried Shortest Job Next?
Example 2
Process number
Time of request
Execution time needed
1
0
10
2
30
30
3
40
20
4
50
5
Note that it’s possible to have idle time.
System load
• A measure of how “busy” the CPU is
• At an instant: how many tasks are currently running or
ready.
– If load > 1, the system is “overloaded”, and work is backing up.
• Typically reported as an average of the last 1, 5, or 15
minutes.
• Based on the schedule, can calculate average load as
well as maximum (peak) load.
Example 1
Job #
Request
Exec
Start
Finish
1
0
20
0
20
2
5
30
20
50
3
10
40
50
90
4
20
10
90
100
“Request time”
aka “Arrival time”
• FCFS schedule can also be depicted this way:
X
X
X
X
R
R
R
X
X
X
X
X
X
R
R
R
R
R
R
R
R
X
X
X
X
X
X
X
X
R
R
R
R
R
R
R
R
R
R
R
R
R
R
• What can we say about the load?
X
X
Example 2
Job #
Request
Exec
Start
Finish
1
0
10
0
10
2
30
30
30
60
3
40
20
65
85
4
50
5
60
65
• SJN schedule can be depicted this way:
X
X
X
• Load?
X
X
X
X
X
R
R
R
R
R
R
R
X
X
X
X
X
Preemptive SJN
• If a new job arrives with a shorter execution time (CPU
burst length) than currently running process, preempt!
• Could also call it “shortest remaining job next”
• Let’s redo previous example allowing preemption
– Job #1 is unaffected.
– Job #2 would have run from 30 to 60, but …
Job #
Request
Exec
Start
Finish
1
0
10
0
10
2
30
30
3
40
20
4
50
5
– Does preemption reduce average turnaround time? Load?
Estimating time
• Some scheduling algorithms like SJN need a job’s
expected CPU time
• We’re interested in scheduling bursts of CPU time, not
literally the entire job.
• OS doesn’t really know in advance how much of a
“burst” will be needed. Instead, we estimate.
• Exponential averaging method. We predict the next
CPU burst will take this long:
pn+1 = a tn + (1 – a)pn
tn = actual time of the nth burst
• Formula allows us to weight recent vs. long-term history.
– What if a = 0 or 1?
Estimating time (2)
• pn+1 = a tn + (1 – a)pn
• Why is it called “exponential”? Becomes clearer if we
substitute all the way back to the first burst.
• p1 = a t0 + (1 – a)p0
• p2 = a t1 + (1 – a)p1
= a t1 + (1 – a) [a t0 + (1 – a)p0 ]
= a t1 + (1 – a) a t0 + (1 – a)2 p0
• A general formula for pn+1 will eventually contain terms of
the form (1 – a) raised to various powers.
– In practice, we just look at previous actual vs. previous prediction
• Book’s example Figure 6.3: Prediction eventually
converges to correct recent behavior.
Priority scheduling
• SJN is a special case of a whole class of scheduling
algorithms that assign priorities to jobs.
• Each job has a priority value:
– Convention: low number = “high” priority
• SJN: priority = next predicted burst time
• Starvation: Some “low priority” jobs may never execute
– How could this happen?
• Aging: modify SJN so that while a job waits, it gradually
“increases” its priority so it won’t starve.
Round robin
•
•
•
•
Each job takes a short turn at the CPU
Commonly used, easy for OS to handle
Time quantum typically 10-100 ms – it’s constant
Choice of time quantum has a minor impact on
turnaround time (Figure 6.5)
– Can re-work an earlier example
• Questions to think about:
• If there are N jobs, what is the maximum wait time before
you can start executing?
• What happens if the time quantum is very large?
• What happens if the time quantum is very short?
Implementation issues
• Multi-level ready queue
• Threads
• Multi-processor scheduling
Multi-level queue
• We can assign jobs to different queues based on their
purpose or priority
• Foreground / interactive jobs may deserve high priority to
please the user
– Also: real-time tasks
• Background / routine tasks can be given lower priority
• Each queue can have its own scheduling regime, e.g.
round robin instead of SJN
– Interactive jobs may have unpredictable burst times
• Key issue: need to schedule among the queues
themselves. How?
Scheduling among queues
• Classify jobs according to purpose  priority
• Priority based queue scheduling
– Can’t run any Priority 2 job until all Priority 1 jobs done.
– While running Priority 2 job, can preempt if a Priority 1 job
arrives.
– Starvation
• Round robin with different time quantum for each queue
• Time share for each queue
– Decreasing % of time for lower priorities
• Or… Multi-level feedback queue (pp. 275-277)
– All jobs enter at Priority 0. Given short time quantum.
– If not done, enter queue for Priority 1 jobs. Longer quantum next
time.
Thread scheduling
• The OS schedules “actual” kernel-level threads
• The thread library must handle user threads
– One-to-one model – easy, each user thread is already a kernel
thread. Direct system call
– Many-to-many or many-to-one models
• Thread library has 1 or a small number of kernel threads
available.
• Thread library must decide when user thread should run on a
true kernel thread.
• Programmer can set a priority for thread library to consider.
In other words, threads of the same process are competing
among themselves.
Multi-processing
• More issues to address, more complex overall
• Homogeneous system = identical processors, job can
run on any of them
• Asymmetric approach = allocate 1 processor for the OS,
all others for user tasks
– This “master server” makes decisions about what jobs run on the
other processors
• Symmetric approach (SMP) = 1 scheduler for each
processor, usually separate ready queue for each (but
could have a common queue for all)
• Load balancing: periodically see if we should “pull” or
“push” jobs
Affinity
• When switched out, a job may want to return next time to
the same processor as before
– Why desirable?
– An affinity policy may be “soft” or “hard”.
– Soft = OS will try but not guarantee.
Why might an OS prefer to migrate a process to a different
processor?
• Generalized concept: processor set
– For each process, maintain a list of processors it may be run on
• Memory system can exploit affinity, allocating more
memory that is closer to the favorite CPU. (Fig. 6.9)
Multicore processor
• Conceptually similar to multiprocessors
– Place multiple “processor cores” on same chip
– Faster, consume less power
– OS treats each core like a unique CPU
• However, the cores often share cache memory
– Leads to more “cache misses”  Jobs spend more time stalled
waiting for instructions or data to arrive
– OS can allocate 2 threads to the same core, to increase
processor utilization
– Fig. 6.11 shows idealized situation. What happens in general?
Real-time Scheduling
• Real-time scheduling
– Earliest Deadline First
– Rate Monotonic
• What is this about?
– Primary goal is avoid missing deadlines. Other goals may
include having response times that are low and consistent.
– We’re assuming jobs are periodic, and the deadline of a job is
the end of a period
Real-time systems
• Specialized operating system
• All jobs potentially have a deadline
– Correctness of operation depends on meeting deadlines, in
addition to correct algorithm
– Often, jobs are periodic; some may be aperiodic/sporadic
• Hard real-time = missing a deadline is not acceptable
• Soft real-time = deadline miss not end of world, but try to
minimize
– Number of acceptable deadline misses is a design parameter
– We try to measure Quality of Service (QoS)
– Examples?
• Used in defense, factories, communications, multimedia;
embedded in appliances
Features
• A real-time system may be used to control specific
device
– Opening bomb bay door
– When to release chocolate into vat
• Host device typically very small and lacks features of
PC, greatly simplifying OS design
–
–
–
–
–
Single user or no user
Little or no memory hierarchy
Simple instruction set (or not!)
No disk drive, monitor
Cheap to manufacture, mass produce
Scheduling
• Most important issue in real-time systems is CPU
scheduling
• System needs to know WCET of jobs 
• Jobs are given priority based on their timing/deadline
needs
• Jobs may be pre-empted
• Kernel jobs (implemented system calls) contain many
possible preemption points at which they may be safely
suspended
• Want to minimize latency
– System needs to respond quickly to external event, such as
change in temperature
– Interrupt must have minimum overhead – how to measure it?
EDF
• Given a set of jobs
– Need to know period and execution time of each
– Each job contributes to the CPU’s utilization: execution time
divided by the period
– If the total utilization of all jobs > 1, no schedule possible!
• At each scheduling checkpoint, choose the job with the
earliest deadline.
– A scheduling checkpoint occurs at t = 0, when a job begins
period or is finished, or when a new job arrives into the system
– If no new jobs enter the system, EDF is non-preemptive
– Sometimes the CPU is idle
• Need to compute schedule one dynamic job at a time
until you reach the LCM of the job periods
– Can predict deadline miss, if any
EDF example
• Suppose we have 2 jobs, A and B, with periods 10 and
15, and execution times 5 and 6.
• At t = 0, we schedule A because its deadline is earlier
(10 < 15).
• At t = 5, A is finished. We can now schedule B.
• At t = 11, B is finished. A has already started a new
period, we can schedule it immediately.
• At t = 16, A is finished. B already started a new period,
so schedule it.
• At t = 22, B is finished. Schedule A.
• At t = 27, A is finished, and CPU is idle until t = 30.
EDF: be careful
• At certain scheduling checkpoints, you need to schedule
the job with the earliest deadline.
– As long as that job has started its period.
– Do each cycle iteratively until the LCM of the job periods.
• Checkpoints include
– t=0
– Whenever a job is finished executing
– Whenever a job begins its period
(This condition is important when we have maximum utilization.)
• Example with 2 jobs
– Job A has period 6 and execution time 3.
– Job B has period 14 and execution time 7.
– U = 1. We should be able to schedule these jobs with EDF.
Example: EDF
• Wrong way: ignoring beginning of job periods
– At t = 0, we see jobs A (period 0-6) and B (period 0-14)
Since A has sooner deadline, schedule A for its 3 cycles.
– At t = 3, we see jobs A (period 6-12) and B (period 0-14)
Since A hasn’t started its period, our only choice is B, for its 7
cycles.
– At t = 10, we have job A (period 6-12) and B (period 14-28)
A has sooner deadline. Schedule A for its 3 cycles.
– At t = 13, A is finished but it missed its deadline. We don’t want
this to happen!
continued
• Job A = (per 6, exec 3) Job B = (per 14, exec 7)
• Correct EDF schedule that takes into account the start of
a job period as another scheduling checkpoint
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
0
1
A A A B B B A A A B B B B A A A B B A A A
2
3
4
5
6
7
8
B
B B B B A A
9
0
1
2
3
A B A A A
4
5
6
7
8
9
B B B A A A
0
1
2
B B B
• Notice:
– At t = 12 and t = 24, we don’t preempt job B, because B’s
deadline is sooner. In the other cases when A’s period begins, A
takes the higher priority.
RM
• Most often used because it’s easy
• Inherently preemptive
• Assign each job a fixed priority based on its period
– The shorter the period, the more often this job must execute, the
more deadlines it has  the higher the priority
• Determine in advance the schedule of the highest priority
job
– Continue for other jobs in descending order of priority
– Be sure not to “schedule” a job before its period begins
• Less tedious than EDF to compute entire schedule
– For highest priority job, you know exactly when it will execute
– Other jobs may be preempted by higher priority jobs that were
scheduled first
RM (2)
• Sometimes not possible to find a schedule
– Our ability to schedule is more limited than EDF.
• There is a simple mathematical check to see if a RM
schedule is possible:
– We can schedule if the total utilization is  n (21/n – 1)
Proved by Liu and Layland in 1973.
– If n (21/n – 1) < U  1, the test is inconclusive. Must compute
the schedule to find out.
– Ex. If n = 2, we are guaranteed to find a RM schedule if U <
82%, but for 90% it gets risky.
– Large experiments using random job parameters show that RM
is reliable up to about 88% utilization.
n
1
2
3
4
5
6
7
8

P(RM)
1.000
0.828
0.780
0.757
0.743
0.735
0.729
0.724
0.693
RM Example
• Suppose we have 2 jobs, C and D, with periods of 2 and
3, both with execution time 1.
• U = 1/2 + 1/3 > 82%, so RM is risky. Let’s try it…
• Schedule the more frequent job first.
C
C
C
C
C
C
C
• Then schedule job D.
C
D
• Looks okay!
C
D
D
C
D
RM Example 2
• Let’s look at earlier set of tasks, A and B, with periods of
10 and 15, and execution times of 5 and 6.
• U = 5/10 + 6/15 = 0.9, also risky.
• Schedule task A first.
123456789012345678901234567890
A A A A A
A A A A A
A A A A A
• Schedule task B into available spaces.
123456789012345678901234567890
A A A A A B B B B B A A A A A B
A A A A A
Comparison
• Consider this set of jobs
Job #
Period
Execution time
1
10
3
2
12
4
3
15
5
• What is the total utilization ratio? Are EDF and RM
schedules feasible?
• Handout
RM vs. EDF
• EDF
– Job’s priority is dynamic, hard to predict in advance
– Too democratic / egalitarian? Maybe we are trying to execute
too many jobs.
• RM
– Fixed priority is often desirable
– Higher priority job will have better response times overall, not
bothered by a lower priority job that luckily has an upcoming
deadline.
– RM cannot handle utilization up to 1 unless periods are in sync,
as in 1 : n1 : n1n2 : n1n2n3 : …
(Analogy: Telling time is easy until you get to months/years.)
RM example
• Let’s return to previous example, this time using RM.
– Job A = (per 6, exec 3) Job B = (per 14, exec 7)
– Hyperperiod is 42
– First, we must schedule job A, because it has shorter period.
1
2
3
4
5
6
A A A
7
8
9
A
A A
10
11
12
13
14
15
A
A
A
…
– Next, schedule job B.
1
2
3
4
5
6
7
A A A B B B A
8
9
10
A A B
11
12
13
14
15
…
B
B
A
A
A
B
– Uh-oh! During B’s period 0-14, it is only able to execute for 6
cycles. Deadline miss  This job set cannot be scheduled. But
it could if either job’s execution time were less  reduce U.
RM Utilization bound
• Liu and Layland (1973): “Scheduling Algorithms for
Multiprogramming in a Hard Real-Time Environment”
– They first looked at the case of 2 jobs. What is the maximum
CPU utilization that RM will always work? Express U as a
function of 1 of the job’s execution time, assuming the other job
will fully utilize the CPU during its period.
• We have 2 jobs
– Job j1 has period T1 = 8
Job j2 has period T2 = 12
– Let’s see what execution times C1 and C2 we can have, and
what effect this has on the CPU utilization.
– During one of j2’s periods, how many times will j1 start?
In general: ceil(T2/T1). In our case, ceil(12/8) = 2.
– They derive formulas to determine C2 and U, once we decide on
a value of C1.
continued
• We have job j1 (T1 = 8)
• Suppose C1 = 2.
and job j2 (T2 = 12)
– C2 = T2 – C1 * (number of times j1 starts)
= T2 – C1 * ceil (T2 / T1)
= 12 – 2 ceil (12 / 8) = 8.
– We can compute U = 2/8 + 8/12 = 11/12.
• Suppose C1 = 4
– C2 = 4
– U = 4/8 + 4/12 = 5/6
– The CPU utilization is actually lower as we increase the
execution time of j1.
• … If the last execution of j1 spills over into the next
period of j2, the opposite trend occurs.
Formula
• Eventually, Liu and Layland derive this general formula
for maximum utilization for 2 jobs:
U = 1 – x(1 – x)/(W + x)
where W = floor(T2/T1)
and x = T2/T1 – floor(T2/T1)
• We want to minimize U: to find at what level we can
guarantee schedulability. In this case W = 1, so
U = 1 – x(1 – x) / (1 + x)
• Setting the derivative equal to 0, we get x = √2 – 1, and
U(√2 – 1) = 2(√2 – 1) = about 0.83
• Result can be generalized to n jobs: U = n(2^(1/n) – 1)
CS 346 – Chapter 7
• Deadlock
– Properties
– Analysis: directed graph
• Handle
– Prevent
– Avoid
• Safe states and the Banker’s algorithm
– Detect
– Recover
Origins of deadlock
• System contains resources
• Process compete for resources:
– request, acquire / use, release
• Deadlock occurs on a set of processes when each one is
waiting for some event (e.g. the release of a resource)
that can only be triggered by another deadlocked
process.
– e.g. P1 possesses the keyboard, and P2 has the printer. P1
requests the printer and goes to sleep waiting. P2 requests the
keyboard and goes to sleep waiting.
– Sometimes hard to detect because it may depend on the order in
which resources are requested/allocated
Necessary conditions
4 conditions to detect for deadlock:
• Mutual exclusion – when a resource is held, the process
has exclusive access to it
• Hold and wait – processes each hold 1+ resource while
seeking more
• No preemption – a process will not release a resource
unless it’s finished using it
• Circular wait
• The first 3 conditions are routine, so it’s the circular wait
that is usually the big problem.
– Model using a directed graph, and look for cycle
Directed graph
• A resource allocation graph is a formal way to show we
have deadlock
• Vertices include processes and resources
• Directed edges
– (P  R) means that process requests a resource
– (R  P) means that resource is allocated to process
• If a resource has multiple instances
– Multiple processes may request or be allocated the resource
– Intuitive, but make sure you don’t over-allocate
– e.g. Figure 7.2: Resource R2 has 2 instances which are both
allocated. But process P3 also wants some of R2. The “out”
degree of R2 is 2 and “in” degree is 1.
Examples
• R2 has 2 instances.
– We can have these edges: P1  R2, P2  R2, P3  R2.
– What does this situation mean? What should happen next?
• Suppose R1 and R2 have 1 instance each.
–
–
–
–
Edges: R1  P1, R2  P2, P1  R2
Describe this situation.
Now, add this edge: P2  R1
Deadlock?
• Fortunately, not all cycles imply a deadlock.
– There may be sufficient instances to honor request
– Fig 7.3 shows a cycle. P1 waits for R1 and P3 waits for R2. But
either of these 2 resources can be released by processes that
are not in the cycle…. as long as they don’t run forever.
How OS handles
• Ostrich method – pretend it will never happen. Ignore
the issue. Let the programmer worry about it.
– Good idea if deadlock is rare.
• Dynamically prevent deadlock from ever occurring
– Allow up to 3 of the 4 necessary conditions to occur.
– Prevent certain requests from being made.
• A priori avoidance
– Require advance warning about requests, so that deadlock can
be avoided.
– Some requests are delayed
• Detection
– Allow conditions that create deadlock, and deal with it as it
occurs.
– Must be able to detect!
Prevention
• “An ounce of prevention is worth a pound of cure”:
Benjamin Franklin
• Take a look at each of the 4 necessary conditions. Don’t
allow it to be the 4th nail in the coffin.
1. Mutual exclusion
–
–
Not much we can do here. Some resources must be exclusive.
Which resources are sharable?
2. Hold & wait
–
–
–
Could require a process to make all its requests at the
beginning of its execution.
How does this help?
Resource utilization; and starvation?
Prevention (2)
3. No resource preemption
– Well, we do want to allow some preemption
– If you make a resource request that can’t be fulfilled at the
moment, OS can require you to release everything you have.
(release = preempting the resource)
– If you make a resource request, and its held by a sleeping
process, OS can let you steal it for a while.
4. Circular wait
– System ensures the request doesn’t complete a cycle
– Total ordering technique: Assign a whole number to each
resource. Process must request resources in numerical order, or
at least not request a lowered # resource when it holds a higher
one.
– Fig. 7.2: P3 has resource #3 but also requests #2. OS could
reject this request.
Avoidance
• We need a priori information to avoid future deadlock.
• What information? We could require processes to
declare up front the maximum # of resources of each
type it will ever need.
• During execution: let’s define a resource-allocation
state, telling us:
– # of resources available (static)
– Maximum needs of each process (static)
– # allocated to each process (dynamic)
Safe state
• To be in a safe state, there must exist a safe sequence.
• A safe sequence is a list of processes [ P1, P2, … Pn ]
– for each P_i, we can satisfy P_i’s requests given whatever
resources are currently available or currently held by the
processes numbered lower than i (i.e. Pj where j < i) by letting
them finish.
– For example, all of P2’s possible requests can be met by either
what is currently available or by what is held by P1.
– If P3 needs a resource held by P2, can wait until P2 done, etc.
• Safe state =  a safe sequence including all processes.
– Deadlock occurs only in an unsafe state.
• The system needs to examine each request and ensure
that if the allocation will preserve the safe state.
Example
• Suppose we have 12 instances of some resource.
• 3 processes have these a priori known needs
Process #
Max needs
Current use
1
10
5
2
4
2
3
9
2
• We need to find some safe sequence of all 3 processes
• At present, 12 – (5 + 2 + 2) = 3 instances available.
• Is [ 1, 2, 3 ] a safe sequence?
Banker’s algorithm
• General enough to handle multiple instances
• Principles
–
–
–
–
No customer can borrow more money than is in the bank
All customers given maximum credit limit at outset
Can’t go over your limit!
Sum of all loans never exceeds bank’s capital.
• Good news: customers’ aggregate credit limit may be
higher than bank’s assets
• Safe state: Bank has enough “money” to service request
of 1 customer.
• Algorithm: satisfy a request only if you stay safe
– Identify which job has smallest remaining requests, and make
sure we always have enough dough
Example
• Consider 10 devices of the same type
• Processes 1-3 need up to 4, 5, 8 of these devices,
respectively
• Are these states safe?
Job
# allocated
Max needed
1
0
4
2
2
5
3
4
8
Job
# allocated
Max needed
1
2
4
2
3
5
3
4
8
Handling deadlock
• Continually employ a detection algorithm
– Search for cycle
– Can do it occasionally
• When deadlock detected, perform recovery
• Recover by killing
– Kill 1 process at a time until deadlock cycle gone
– Kill which process? Consider: priority, how many resources it
has, how close it is to completion.
• Recover by resource preemption
– Need to restart that job in near future.
– Possibility for starvation if the same process is selected over and
over.
Detection algorithm
• Start with allocation graph, and “reduce it”
While no changes do:
• Find a process using a resource & not waiting for one.
Remove edge: process will eventually finish.
• Can now re-allocate this resource to another process, if
needed.
• Also can perform other resource allocations for
resources not fully allocated.
• If there are any edges left, we have deadlock.
Example
• 3 processes & 3 resources
• Edges:
–
–
–
–
–
(R1  P1)
(P1  R2)
(R2  P2)
(P2  R3)
(R3  P3)
• Can this graph be reduced to the point that it has no
edges?
CS 346 – Chapter 8
• Main memory
– Addressing
– Swapping
– Allocation and fragmentation
– Paging
– Segmentation
• Commitment
– Please finish chapter 8
Addresses
• CPU/instructions can only access registers and main
memory locations
– Stuff on disk must be loaded into main memory
• Each process given range of legal memory addresses
– Base and limit registers
– Accessible only to OS
– Every address request compared against these limits
• When is address of an object determined?
– Compile time: hard-coded by programmer
– Load time: compiler generates a relative address
– Execute time: if address may vary during execution because the
process moves. (most flexible)
Addresses (2)
• Logical vs. physical address
– Logical (aka virtual): The address as known to the CPU and
source code
– Physical = the real location in RAM
– How could logical and physical address differ? In case of
execution-time binding. i.e. if the process location could move
during execution
• Relocation register
– Specifies what constant offset to add to logical address to obtain
physical address
– CPU / program never needs to worry about the “real” address, or
that addresses of things may change. It can pretend its address
start at 0.
Swapping
• A process may need to go back to disk before finishing.
– Why?
• Consequence of scheduling (context switch)
• Maintain a queue of processes waiting to be loaded from
disk
• Actual transfer time is relatively huge
– When loading a program initially, we might not want to load the
whole thing
• Another question – what to do if we’re swapped out while
waiting for I/O.
– Don’t swap if waiting for input; or
– Put input into buffer. Empty buffer next time process back in
memory.
Allocation
• Simplest technique is to define fixed-size partitions
• Some partitions dedicated to OS; rest for user processes
• Variable-size partitions also possible, but must maintain
starting address of each
• Holes to fill
• How to dynamically fill hole with a process:
– First fit: find the first hole big enough for process 
– Best fit: find smallest one big enough 
– Worst fit: fit into largest hole, in order to create largest possible
remaining hole 
• Internal & external fragmentation
Paging
• Allows for noncontiguous process memory space 
• Physical memory consists of “frames”
• Logical memory consists of “pages”
– Page size = frame size
• Every address referenced by CPU can be resolved:
– Page number
– Offset
– how to do it? Turns out page/frame size is power of 2.
Determines # bits in address.
• Look up page number in the page table to find correct
frame
Example
• Suppose RAM = 256 MB, page/frame size is 4 KB, and
our logical addresses are 32 bits.
–
–
–
–
How many bits for the page offset?
How many bits for the logical/virtual page number?
How many bits for the physical page number?
Note that the page offsets (logical & physical) will match.
• A program’s data begins at 0x1001 0000, and text
begins at 0x0040 0000. If they are each 1 page, what is
the highest logical address of each page?
• What physical page do they map to?
• How large is the page table?
Page table
• HW representation
– Several registers
– Store in RAM, with pointer as a register
– TLB (“translation look-aside buffer”)
Functions as a “page table cache”: Should store info about most
commonly occurring pages.
• How does a memory access work?
– First, inspect address to see if datum should be in cache.
– If not, inspect address to see if TLB knows physical address
– If no TLB tag match, look up logical/virtual page number in the
page table (thus requiring another memory access)
– Finally, in the worst case, we have to go out to disk.
Protection, etc.
• HW must ensure that accesses to TLB or page table are
legitimate
– No one should be able to access frame belonging to another
process
• Valid bit: does the process have permission to access
this frame?
– e.g. might no longer belong to this process
• Protection bit: is this physical page frame read-only?
• Paging supports shared memory. Example?
• Paging can cause internal fragmentation. How?
• Sometimes we can make page table more concise by
storing just the bounds of the pages instead of each one.
Page table design
• How to deal with huge number of pages
• Hierarchical or 2-level page table
– In other words, we “page” the page table.
– Split up the address into 3 parts. “outer page”, “inner page” and
then the offset.
– The outer page number tells you where to find the appropriate
part of the (inner) page table. See Figure 8.15.
– Not practical for 64-bit addressing! Why not?
• Hashed page table
– Look up virtual page number in a hash table.
– The contents of the cell might be a linked list: search for match.
• Inverted page table
– A table that stores only the physical pages, and then tells you
which logical page map to each. Any disadvantage?
Segmentation
• Alternative to paging
• More intuitive way to lay out main memory…
– Segments do not have to be contiguous in memory
– Process has segment table: For each segment, stores the base
address and size
• As before, a process has a “logical address space”
– But now: it consists of segments, each having a name and size.
– How does a program(mer) specify an address in a segmented
scheme?
– What kinds of segments might we want to create for a program?
• HW may support both paging and segmentation
– So, OS may exploit either or both addressing techniques.
– To ignore segmentation, just use 1 segment for entire process.
Pentium example
• To convert logical to physical address
– Handle the segmentation first…
– Segmentation unit takes the logical address, and converts this to
a linear address (why?)
– Paging unit takes the linear address and converts this to a
physical address (somewhat familiar process)
• A segment may be up to 4 GB, so offset is 32 bits
– Logical address has 2 parts: segment number plus offset
– Look up segment number into “descriptor table”. Entries in this
table give the upper bits of the 32-bit linear address.
• Pentium uses 2-level paging
– Outer and inner page numbers are 10 bits each. What
information does this tell you?
CS 346 – Section 9.1-9.4
• Virtual memory
– (continues similar themes from main memory chapter)
– What it is
– Demand paging
– Page faults
– Copy on write
– Page replacement strategies
Virtual memory
• Recall: main memory management seeks to support
multiprogramming
• VM principles
– Allow process to run even if only some of it is in main memory
– Allow process to have a logical address space larger than all
physical memory
– Allow programmer to be oblivious of memory management
details, except in extreme cases.
• Motivation
– Some code is never executed. Some data never used.
– Programmer may over-allocate an array.
– Even if we need to load entire program, we don’t need it all at
once.
– We’ll use less RAM, and swap fewer pages.
Using VM
• The programmer (or compiler) can refer to addresses
throughout entire (32-bit) address space.
– In practice, may be restricted, because you may want to have
virtual addresses for outside stuff; but still a huge fraction
– All addresses will be virtual/logical, and will be translated to
actual physical address by OS and HW
– We can allocate a huge amount of VM for stack and heap, which
may grow during execution.
– Stack and heap will be unlikely to bump into each other.
• Supports sharing of code (libraries) and data
– Virtual addresses will point to the same physical address
Demand paging
• Typical way to implement VM
• Only bring a page in from disk as it is requested.
– What is benefit? Why not load all pages at once?
– “lazy pager” more accurate term than “lazy swapper”
• Pager initially guesses which pages to initially load
– “pure demand paging” skips this step
• Valid bit: is this page resident in RAM?
• If not: page fault
– The page we want is not in physical memory (i.e. it’s in the
“swap space” on disk)
– How often does this happen?
– Temporal and spatial locality help us out 
Page fault
Steps to handle:
• OS verifies the problem is not more severe
• Find free space in RAM into which to load proper page
• Disk operation to load page
• Update page table
• Continue execution of process
• Cost of page fault ~ 40,000x normal memory access
– Probability should be minuscule
Copy-on-write
• A memory optimization
• Can be used when we fork, but not exec
• No real need to duplicate the address space
– Two processes running the same code, accessing same data
• Until… one of the processes wants to write.
– In this case, we create a 2nd copy of the page containing the
written-to area.
– So, we only copy some pages. Compare Figures 9.7 and 9.8
• If you want to exec immediately after fork, you would not
need to copy-on-write.
– vfork( ) system call: child shares same pages as parent. Child
should not alter anything here because of the exec. But if child
did, changes would be seen by parent.
Page fault
• Demand paging to implement virtual memory √
• What is a page fault?
• How to handle
– … Find a free frame and load the new page into it …
– But what if no frame is free? Aha!
• Extreme approaches
– Terminate process if no free frame available
– Swap out a process and free all its pages being used
• Alternative: replace (i.e. evict) one of the resident pages
– Need to amend the procedure for handling page fault:
– Copy victim to disk if necessary; replace frame with new page
– Let process continue
Issues
• Frame allocation
– How many frames should we give to each process?
– If more than enough, never need to evict a page. (Too good…)
– More about this later (section 9.5)
• Page replacement algorithm
– Need a way to pick a victim
– Many such algorithms exist
– Goal: reduce total # of page faults (or the rate), since costly!
• To simplify analysis of page behavior, use “reference
string”: list of referenced pages, rather than complete
addresses. (p. 412)
– Given # frames, replacement algorithm and reference string,
should be able to determine # of page faults.
Clairvoyant
• The clairvoyant page replacement algorithm is optimal.
– In other words, the minimum possible number of page faults
• Replace the page that will not be used for the longest
period of time in the future.
• Not realistic to know such detailed info about the future,
so it’s not a real algorithm
• Useful as a benchmark.
– If your algorithm is better, check your arithmetic.
FIFO
• “First in, first out” – queue philosophy
• Evict the page that has been resident the longest.
• Example with 3 frames:
– 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1
– 15 page faults, compared to 9 with clairvoyant
• Does this policy make sense?
– Being “old” has nothing to do with being useful or not in the
future.
– Startup routines may no longer be needed. Ok.
– Does a grocery store get rid of bread to make way for green tea?
• Belady’s anomaly
– Undesirable feature: it’s possible to increase # frames and see
an increase in # of page faults. (p. 414)
LRU
• “Least recently used”
• Attempts to be more sensible than FIFO
– More akin to a stack, rather than a queue
• Example with 3 frames
– 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1
Has 12 page faults 
• Problem: how to represent the LRU information
– Stack of page numbers:
Reference a page  bring it to the top
Evict the lowest page # in the stack
– Associate a counter or timestamp for each page. Search for min.
– HW might not support these expensive ops: require significant
overhead, e.g. update for each reference.
Almost LRU
• We want to perform fewer HW steps
– It’s reasonable to test/set a bit during a memory reference. Not
much more than this.
– Gives rise to reference bit(s) associated with each page.
• Second chance FIFO
– When a page is referenced, set its reference bit.
– When time to find victim, scan the frames. If ref bit = 1, clear it.
If ref bit already 0, we have our victim. Next time need to search
for victim, continue from here (circular/“clock” arrangement).
• Multiple reference bits
– Periodically shift left the ref value. Evict page that has all 0’s.
• Use reference count (MFU or LFU)
– When a page is referenced, increment its reference value.
– Policy may be to evict either least or most frequently referenced.
CS 346 – Sections 9.5-9.7
• Paging issues
– How big should a page be?
– Frame allocation
– Thrashing
– Memory-mapped files & devices
• Commitment
– Please finish chapter 9
Page size
• HW may have a default (small) page size
– OS can opt for somewhat larger sizes
– If we want 4K pages, but the default is 1K, then tell HW to always
group its pages in fours
• Small or large?
– On average, ½ of final page will be blank (internal fragmentation)
– But, small pages  larger page table
• Let’s measure overhead
–
–
–
–
–
s = average process size; p = page size; e = size of page table entry
We’ll need about s/p pages, occupying se/p bytes for page table.
Last-page waste = p/2
Total overhead = se/p + p/2. See the trade-off?
Optimum result p = sqrt(2se) ~ sqrt(2 * 1MB * 8) = 4 KB
Frame allocation
• A process needs a certain minimum number of frames.
• Some instructions may require 2 memory references
(unusual), plus the instruction itself.
– All 3 memory locations may be in different pages
– To execute this single instruction, we would need 3 frames.
– Also, a memory reference could straddle a page boundary. Not
a good HW design.
– Book mentions example of inst requiring up to 8 frames.
•
•
•
•
Equal allocation among processes
Proportional allocation (to total process size)
Priority bonus
Allocation needs to be dynamic: changes in # processes
Allocation (2)
• Global or local page replacement?
– Local = you can only evict your own pages
With this policy: the number of frames allocated to a process
never changes.
– Global = you can evict someone else’s page
You are at the mercy of other processes. # of page faults
depends on the environment.
But if you need extra space, you can take it from someone who
isn’t using theirs.
More responsive to actual memory needs  throughput. 
• Non-uniform memory
– With multiple CPUs and memories, we desire frames that are
“closer” to the CPU we are running on.
Thrashing
• Spending more time paging than doing useful work.
• How does it occur?
– If CPU utilization low, OS may schedule more jobs.
– Each job requires more frames for its pages. It takes frames
away from other jobs. More page faults ensue.
– When more and more jobs wait to page in/out, CPU utilization
goes down. OS tries to schedule more jobs.
– Fig 9.18 – don’t have too many jobs running at once!
• To avoid the need for stealing too many frames from
other jobs, should have enough to start with.
– Locality principle: At any point in the program, we need some,
but not all of our pages. And we’ll use these pages for a while.
Loops inside different functions.
• Or: swap out a job and be less generous in future.
Working set model
• A way to measure locality by Peter Denning, 1968.
• Begin by setting a window
– How far back in time are you interested?
– Let  = the number of memory references in the recent past
– What if  is too big or too small?
• Working set = set of pages accessed during window
– Another number: Working set size (WSS) :
How many pages accessed during the window
– For example, we could have  = 10,000 and WSS = 5.
• OS can compute WSS for each job.
– If extra frames still available, can safely start a new job.
• Practical consideration: how often to recalculate WSS?
Memory mapping
• Often we read a file sequential from start to finish.
– Seems a shame to make so many system calls and disk
accesses for something so routine (e.g. read a single character
or line of text).
– Instead, pages in memory get allocated to file on disk.
• When writing data to file, disk contents not immediately
updated.
– RAM acts as buffer
– periodic checks: if something written, write to disk
– Final writes when job is done.
• For read-only file, multiple jobs can share this memory
• Other I/O devices also mapped to pages (screen, printer,
modem) 
CS 346 – rest of Ch. 9
• Allocating memory for kernel
• Making paging work better
–
–
–
–
prepaging
TLB reach
Memory-aware coding
Locking pages
Kernel memory
• Some memory is reserved for kernel
• To minimize overhead
– (fragmentation): we don’t allocate entire pages at a time
– (efficiency / direct memory access): OS would like to allocate a
contiguous block of memory of arbitrary size
• Simple approach: “buddy system”:
• Memory manager maintains list of free blocks of size 1,
2, 4, 8, … bytes up to some maximum e.g. 1 MB.
• Initially, we have just 1 free block: the entire 1 MB.
– Over time this may get split up into smaller pieces (buddies).
• When kernel needs some memory, we round it up to the
next power of 2.
• If no such size available, split up something larger.
Example
1024
Request
A = 70K
A
Request
B = 35K
A
B
64
Request
C = 80K
A
B
64
C
128
512
Return A
128
B
64
C
128
512
Request
D = 60K
128
B
D
C
128
512
Return B
128
64
D
C
128
512
C
128
512
Return D
Return C
128
256
256
512
256
512
1024
When a block of size 2k is freed, memory manager only has to search other 2k
blocks to see if a merge is possible.
Slab
• Relatively new technique
• Kernel objects are grouped by type  in effect, grouped
by size
– e.g. semaphores, file descriptors, etc.
• OS allocates a “cache” to hold objects of the same type.
– Large enough to hold several such objects. Some are unused,
i.e. “free”
• How many objects are in a cache?
– 1 page (4 K) is usually not enough. So we may want several
contiguous pages – this is called a slab.
– So, we achieve contiguous memory allocation, even though the
objects might not be resident contiguously themselves. See
figure 9.27
Prepaging
• In order to avoid initial number of page faults, OS can
bring in all needed pages at once.
• Can also do this when restarting a job that was swapped
out. Need to “remember” the working set of that job.
• But: will the job need “all” of its pages?
• Is the cost of prepaging < cost of servicing all future
individual page faults?
TLB reach
• For paging to work well, we want more TLB hits too
• TLB reach = how much memory is referred to by TLB
entries
– Memory-intensive process  more TLB misses
• Approaches to improve TLB hit rate
– Larger TLB
But sometimes, to achieve acceptable hit rate, need
unreasonably large table!
– Allow for larger page size
For simplicity, can offer 2 sizes (regular and super)
OS must manage the TLB, so it can change page size as
needed. Any disadvantage?
Memory-aware code
• Page faults do happen. Keep working set small if you
can.
• Let’s initialize array elements. Does it matter if we
proceed row or column major?
• Data structures: stack, queue, hash table
• BFS vs. DFS – which is better with respect to memory?
• array versus ArrayList
Locking pages
• Sometimes we want to make sure some pages don’t get
replaced (evicted)
• Each frame has a lock bit
• I/O
– Actual transfer of data performed by specialized processor, not
the CPU
– When you request I/O, you go to sleep while the transfer takes
place.
– You don’t want the I/O buffer pages to be swapped out!
• Kernel pages should be locked
• Can lock a page until it has been used a little
– To avoid situation where we replace a page we just brought in
CS 346 – Chapter 10
• Mass storage
– Advantages?
– Disk features
– Disk scheduling
– Disk formatting
– Managing swap space
– RAID
Disks
• Anatomy (Figure 10.1)
– Sector, track, cylinder, platter
– Read/write head attached to arm, attached to arm assembly
– Head quickly reads binary data as: orientation of iron ions or
reflectiveness of surface
• Example: CD
– About 25,000 tracks, 50 sectors per track  1 bit occupies about
1 square m
– Entire CD can be read in about 7 minutes on a 12x speed drive
• But usually we don’t read entire disks
2 aspects dominate access time:
– Seek time: proportional to square root of seek distance
– Rotational latency
Some specs
Floppy disk
Hard drive (2001) Hard drive (2011)
Cylinders
40
10 601
310 101
Tracks/cylinder
2
12
16
Sectors/track
9
281 (average)
63
Sectors/disk
720
35 742 000
312 500 000
Bytes/sector
512
512
512
Capacity
360 KB
18 GB
160 GB
Seek adjacent track
6 ms
0.8 ms
Seek (average)
77 ms
6.9 ms
9.5 ms
Rotation
200 ms
8.3 ms
8.3 ms
Transfer 1 sector
22 ms
17 s
1.7 s
Disk scheduling
• Common problem is a backup of disk requests
• Disk queue
• When disk is ready, in what order should it do the disk
requests? Similar to problem of scheduling CPU
• Pending jobs are classified by which track/cylinder they
want to access
• Ex. 4, 7, 16, 2, 9, 1, 9, 5, 6
• Several disk scheduling algorithms exist
– Simple approach: first-come, first-served
– Total head movement = ?
– Want to reduce total seeking time or head movement: avoid
“wild swings”.
– Would be nice not to finish at extreme sector number.
Scheduling (2)
• Shortest seek first
– For 4, 7, 16, 2, 9, 1, 9, 5, 6: After serving track 4, where do we
go next? Total head movement = ?
– Very good but not optimal
• Elevator algorithm (“scan” method)
– Pick a direction and go all the way to end, then come back and
handle all other requests.
– Better than Shortest Seek in our example?
• Circular scan
– Same as elevator algorithm BUT: when you reach the end you
immediately go to the other end without stopping for requests. In
other words, you only do work as head is moving in 1 direction.
• Look scheduling: modify elevator & circular scan so you
only go as far as highest/lowest request
Disk mgmt
• Low-level (physical) formatting
– Dividing disk medium into sectors
– Besides data, sector contains error-correcting code
– Later, disk controller will manipulate individual sectors
• High-level (logical) formatting
– Record a data structure for file system on disk
– Partition groups of cylinders if desired
– Associate adjacent blocks into logical clusters to support file I/O
• “Sector sparing”: compensate for bad blocks!
– Maintain list of bad blocks; replace each with a spare one
• Boot from disk: boot blocks in predefined locations
contain system code to load  “boot partition” of drive
Swap space
• Recall: used in virtual memory to store pages evicted
from RAM
– Faster to return to RAM than loading from file from scratch
– In effect: disk space is now being used as extension of main
memory, the very essence of VM
• Logically a separate partition of the disk from the file
system
• When process started, it’s given some swap space
• Swap map: kernel data structure to track usage
– Associate an counter value with each page in swap area
– 0 means that page is available to swap into
– Positive number: number of processes using that swapped-out
data (> 1 means it’s shared data)
RAID
• Increasingly practical to have several disks on a system
– But increases probability & mean time to failure
• RAID = “redundant array of independent disks”
– Redundancy: fault tolerance technique
• Six “levels” or strategies of RAID: use various
combinations of fault tolerant techniques
• Typical RAID techniques in use
– Striping a group of disks: split bits of each byte across disks
Or block-level striping: split blocks of a file…
– Mirroring another disk
– Store parity (error-correcting) bits on another disk
– Leaving some disks empty until needed to replace failed disk
RAID levels
Various combinations of techniques… For example:
• RAID 0 – block striping; no mirroring or parity bits
• RAID 1 – add mirrored disks
• RAID 2, 3, 4 – extra disks store parity bits
– If 1 disk fails, remaining bits of each byte and error-correction bit
can be used to construct lot bit of each byte.
– RAID 3 – bit-interleaved parity
– RAID 4 – block-interleaved parity
• RAID 0+1 – a set of disks is striped, and then the stripe
is mirrored to another disk
• RAID 1+0 – disks are mirrored into a pair of disks. This
pair is then striped.
RAID extensions
• RAID is designed just to detect & handle disk failure
• Does not prevent/detect data corruption, etc.
– Could be pointing to wrong file, wrong block
• Checksum for data and metadata on disk
– Ex. For each disk block, how many bits are set?
– Store with pointer to object (See Figure 10.13)
– Detect whether it has changed. Grab correct data from the
mirror.
• RAID also somewhat inflexible because its techniques
require a certain number of disks. What to do?
CS 346 – Chapter 11
• File system
– Files
– Access
– Directories
– Mounting
– Sharing
– Protection
Files
• What is a file?
• Attributes
– Name, internal ID, type, location on device, size, permissions,
modification/creation time
• Operations
– Create, read, write, reposition file pointer (seek), delete, truncate
(i.e. to zero)
– Less essential: append, rename, copy
– The first time we refer to a file, need to search for it: “open”
• Active file tables. What is stored in each?
– Table per process
– System-wide table
• The “open count” for a file
Type and structure
• Policy question – should OS be aware of file types?
• How file type determined
– filename extension
– Keep track of which application created file
– Magic number
• File type determines its structure
– At a minimum: bits and bytes
– e.g. OS expects executable file to have certain format
– Text file: recognize meaning of certain ASCII codes
• Files stored in “blocks” on a device
– Each I/O operation can grab one block (~ 1KB <= page size)
– Can start a new file on a new block, or do some “packing”
Accessing data
• Sequential access
– Read, write, rewind operations
– We almost always utilize files this way
• Direct access
– More complex system calls: Allow arbitrary access to any byte
in file on demand
– What kind of application needs this functionality?
– Read/write operations may specify a relative or absolute block
number
• Indexed access
– Another file stores pointers to appropriate blocks in some large
file
Directories
• File system resides on some “volume”
– A volume may be a device, part of a device, multiple devices:
– So, can have multiple file systems on the same device (partition)
– A file system can use multiple devices, but this adds complexity
• Can have specialized “file systems” to allow certain
devices to be treated as files, with file I/O commands
• Volume must keep around info about all files
– Confusingly called a directory
• Directory operations on files:
– Search, create, delete, list, rename, traverse
File organization
• How are files logically organized in the directory?
• Single-level directory: one flat list
– File names must be unique
– Excellent if everyone is sharing files
• Two-level directory
– Each user has a separate directory: Figure 11.9
– System maintains a master file directory: pointers to each user’s
file directory
– Allows user’s work to be isolated
– Can specify file by absolute or relative path name
– Special “system user” for system files. Why necessary?
– Search path: sequence of directories to use when searching for
a file. Look here, look in system folder, etc.
File org (2)
• Tree-based directory: Files can be arbitrarily deep
• Allows user to impose local structure on files
• Each process has a current working directory
– To access file, need to specify path name or change the current
directory
• Policy on deleting an entire directory
• Acyclic directory: support links to existing files
–
–
–
–
–
–
In effect, the same file has multiple path names
Same file exists in multiple directories
But there is just 1 file, not a copy
When traversing, need to ignore the links
What happens when we delete file? Links now point to …
Can count the # of references to file (like garbage collection)
Mounting
• Mount = make volume/device available to file system.
• Assign a name to its root so that all files will have a
specific path name.
• Mount point = position in existing file system in which we
insert the new volume.
– Think of inserting a subtree at a new child of an existing node.
– E.g. You plug in a USB drive, and immediately it acquires the
name E: so you can access its files
– In UNIX, a new “volume” may appear under /
• Unused volumes may be temporarily unmounted if file
system desires
File sharing
• In multi-user system, desirable to have some files
accessible by multiple users!
• File system must have more info
– Owner of each file
– Assign unique ID numbers for users and groups of users
– When you access file, we check your IDs first
• Remote file system access
– Manually transfer files via FTP
– Distributed file system: see a file system on another computer
on the network
– Anonymous browsing on the Web
Remote file system
• We’d like to mount a remote file system on our machine.
– In other words, be able to give (path) names to remote files to
manipulate them.
• Client-server relationship: a file server accepts requests
for remote machines to mount
– E.g. You are logged into ultrax2, but ultrax1 is the file server.
– NFS is a standard UNIX file sharing protocol
– OS file system calls are translated into remote calls
• One challenge – to authenticate the client.
– Typically the client & server share same set of user IDs. When
you get a computer account, your user ID is good everywhere.
– Or, provide your password the first time you access server.
• What is role of distributed naming service, e.g. DNS ?
Consistency
• Policy decisions concerning how we handle multiple
users accessing the same file
– Reminiscent of synchronization
• When do changes made by one user become
observable to others?
– Immediately, or not until you reopen the file?
• Should we allow 2 users to read/write concurrently?
– As in a database access
• System may define immutable shared file
– Like a CD-R
– Cannot be modified, name cannot be resused.
– No constraints on reading
Protection
• Owner/creator of file should set capabilities for
– What can by done
– By whom
• Types of access
– Read
– Write
– Execute
Could also distinguish other access capabilities:
– Delete
– List
Specifying permissions
• Establish classes of users, each with a possibly distinct
set of permissions
– Classes can be: owner, group, rest of world
• For each level of users:
– ‘r’ = Can I read the file?
– ‘w’ = Can I write to (or delete) the file?
– ‘x’ = Can I execute the file?
• Examples
– rw-rw-r-– rwxr-xr-– rw-r-----
(664)
(754)
(640)
• If no groups, can set group permission = rest of world.
• Use chmod command
CS 346 – Chapter 12
• File systems
– Structure
– Information to maintain
– How to access a file
– Directory implementation
– Disk allocation methods efficient use, quick access
– Managing free space
– Efficiency, performance
– Recovery
Structure
• File system is usually built on top of disks
– Medium can be rewritten in place
– Relatively easy to move to another place on disk
• Purpose of file system
– Provide a user interface to access files
– Define a mapping between logical files and space on a
secondary storage device
• FS have several levels/layers of abstraction &
functionality, e.g. 4
–
–
–
–
Logical file system
File organization module
Basic file system
I/O control
Layers
• Logical file system
– Maintain file’s metadata: inside a “file control block” aka “inode”
– Directory data structure
• File-organization module
– Translates between logical and physical data blocks of a file. In
other words it knows everybody’s real address.
– e.g. logical block numbers might always start 0
• Basic file system
– Manipulate specific sectors on disk.
– Maintain buffers for file I/O
• I/O control
– Device drivers give machine-language commands to device to
accomplish the file I/O.
– (Different file systems can use the same device drivers.)
FS information
On disk:
• Boot (control) block = first
block on a volume. Give inst
on how to load the OS.
• Volume control block =
“superblock”
– Statistics about the volume:
# of blocks, their size, how
many are free and which
ones
• Directory data structure: point
to each file
• File control block (inode) for
each file (contains what info?)
In memory:
• Which volumes are currently
mounted
• Cache of recently accessed
directories (faster access)
• Which files are currently open
– Per process
– System-wide
• Buffers holding currently
processing file I/O
Opening file
• open( ) system call passes file name to logical FS
• See if anyone else already has this file opened.
– How?
– What if it is?
• If not already open, search the directory
• If found,
– copy file control block (inode) to system-wide open file table
– Set pointer in process’ open file table (Why not the inode?)
– Also in process’ table: dynamic stuff like initialize current
location within file, whether opened for read or write, etc. Should
we copy this to inode also?
• open( ) returns file descriptor (i.e. pointer to per-process
table entry). Use this for future I/O on this file.
Multiple file systems
• We generally don’t have 1 file system in charge of the
entire disk
• Disks usually have partitions…
• Raw partition
– Where you don’t want/need to have files
– Ex. Swap space; information related to backups
• Boot partition – should be treated special / separate
– Contains program to ask user which OS to boot
– Multiple OS can give rise to different FS
• Use “virtual file system” to manage multiple FS
– Hide from user the fact  > 1 FS
Virtual FS
• See Figure 12.4
• Purpose: act as an interface between the logical FS the
user interacts with, and the actual local/remote file
system
• Defines essential object types, for example
–
–
–
–
File metadata, e.g. inode
Info about an open file
Superblock: info about an entire file system
Directory entries
• For each type of object, set of operations defined, to be
implemented by individual FS
– Ex. For a file: open, read, write, …
Directory rep’n
• A question of which data structure to use
• Linear list?
– Essentially an array of pointers (we point to the data blocks)
– Advantage / Disadvantage?
• Other data structures are possible: any good?
– Sorted list
– Binary search tree; B-tree
– Hash table
1. Contiguous allocation
• Advantage – few seeks needed on disk
– Ex. We would like a file to reside entirely in one track if possible
• If you know disk address of first block, and length of file,
you know where entire file is.
• Problems
– Where to put a new file: dynamic storage allocation: best fit,
worst fit, first fit
– External fragmentation
– Can’t predict a file deletion that would give you a better fit
– Don’t know size of brand new file
– Preallocating extra space: internal fragmentation
• Can compact (defragment) files. Tedious operation.
• File “extents”: a modification to contiguous scheme 
2. Linked allocation
• File is linked list of disk blocks (Figure 11.6)
• File’s directory entry points to first & last blocks (in
addition to maintaining other file attributes)
• Avoids disadvantages of contiguous allocation 
– No fragmentation, don’t need to know size in advance, …
• Criticism
– Linked list inefficient to access data “directly” as opposed to
sequentially. Ex. Editor requests to go to 3 millionth line.
– What if 1 of the pointers becomes damaged?
– Minor overhead from the pointer in each block. Can define
“clusters” of the file to be contiguous blocks, but this suffers
some fragmentation.
File allocation table
• Located at start of disk
• Table has an entry for each disk block
– Has fixed size and is indexed by disk number
– Purpose of these table entries is to point to next block in a file,
like emulating a linked list with an array
• File’s directory entry contains the starting block number.
– See Figure 11.7
• Performance problem:
– Need to do 2 seek operations every time you go to a new block
in the file. Why?
• Direct access with FAT is faster than pure linked
allocation. Why?
3. Indexed allocation
• The file on disk begins with an index block
• Index block contains pointers to the various disk blocks
containing the actual data of the file.
• When file first created, all pointers set to null. One by
one, they get initialized as file grows.
• File’s directory entry contains block number of the index
block. See Figure 11.8
• If all blocks on disk are exactly 1KB in size, how big of a
file can we support using this scheme?
Bigger files
• Linked indexed allocation: Can continue pointers in
another index block. In general, can have a linked list of
index blocks.
– How big a file can we have with 2 linked index blocks?
• Multilevel indexed allocation
– Begin with a first-level index block. Entries contain addresses of
second-level index blocks.
– Each second-level index block has pointers to actual file data.
– How big a file can we have?
• Direct & indirect indexed allocation
– File’s directory entry can hold several block numbers itself.
– Followed by: single indirect block, double indirect block, triple
indirect block. See figure 11.9 
Free space
• As long as volume is mounted, system maintains freespace “list” of unused blocks. Question is how to
represent this info.
• Bit vector: how much space? Keep it in memory?
• Collect all free blocks into linked list.
– We don’t typically traverse this list. Just grab/insert one.
• Grouping technique
– First “free” block used to store addresses of n – 1 actual free
blocks. Last address stores location of another indirect block of
free addresses.
• Counting: store address along with number of
contiguous free blocks
Efficiency
• Where should inodes be on the disk? All in one place or
scattered about?
• Using linked allocation (treating data blocks as a linked
list on disk)
– How to keep a lid on the number of nodes in a list?
– How to reduce internal fragmentation?
• File metadata may include the last time file accessed
– How expensive is this operation? Response/alternative?
• Size of pointers (locations holding address)
• Should the system’s global tables (process, open files)
be fixed or variable length?
Performance
Some techniques to optimize disk usage
• Disk controller: store contents of a whole track
– Why? What is necessary to accomplish this?
• Buffer cache and page cache
– “Cache” a file’s data blocks / physical pages of virtual memory
• Caching the pages may be more efficient:
– Pages can be individually larger than individual data blocks
– Fewer computational steps to do virtual memory access than
interfacing with the file system.
• Not too efficient to employ both kinds of caches
– “Double caching problem” with memory-mapped I/O: data first
arrives into a page cache because the device is paged… and
then copied to/from buffer cache
Performance (2)
• Do you want writes to be synchronous or asynchronous?
– Pass a parameter to open( ) system call to specify which you
want.
– Which is better typically?
– When one is preferred over the other?
• Page replacement: default policy like LRU may be bad
in some situations
– Consider sequential access to a file: Remove a page as soon as
the next one is in use. Request the next few pages in advance.
Recovery
• Need to protect from
– Loss of data
– Inconsistency (corruption) of data – resulting from what?
• Consistency checking
–
–
–
–
–
Scan file metadata, see if it all makes sense
See if data blocks match a file correctly: traverse all pointers
Check free block list
What if something is wrong?
Is some information more critical than others? What extra
protection to give?
• Log transactions: tell which operations are pending, not
complete
• Full vs. incremental backups
Network file system
• Principles
– Each machine has its own file system
– Client-server relationships may appear anywhere
– Sharing only affects the client
• To access remote directory, mount it
– A remote directory is inserted in place of an existing (empty)
directory whose contents now becomes hidden.
– It will then look & behave like part of your local file system
– Supports user mobility: access your files anywhere in the
network
• Protocol
– Server has list of valid file systems that can be made available,
and access rights (for each possible client)
CS 346 – Chapter 13
• I/O systems
–
–
–
–
Hardware components
Polling & interrupts
DMA: direct memory access
I/O & the kernel
• Commitment
– Please read chapter 13.
I/O
• Challenge: so many different I/O devices
• Try to put a lid on complexity
– Classify I/O by how they behave
– All devices should have a common set of essential features
• Each device has a controller (hardware / circuitry) that is
compatible to the host machine.
– Process running on CPU needs to read/write values in registers
belonging to an I/O controller
• Corresponding device driver installed as part of the OS
– Communicates with controller
– I/O instructions ultimately “control” the devices
• Devices can have memory addresses allocated to them
Concepts
• Port – physical connection point between device and
computer
• Bus – set of wires connecting 1+ devices
– The bus itself is connected to the port, and devices are
connected to the bus
– Figure 13.1: Notice controllers connected to bus
– System enforces some protocol for communication among the
devices along this bus
• Daisy chain – another way to group devices
– One device is connected directly to the computer
– Each other device is connected to another device along the
chain. Think of it as a linked list of devices, with the first device
directly connected.
Memory mapped I/O
• Some of RAM is reserved to allow processes to
communicate with I/O controllers
• We read/write data to specific address
– This address is assigned to a specific port  identify device
– Each device is given a specific range of addresses: Fig. 13.2
– Address also signifies meaning of the value. E.g. Status of I/O
request, command to issue to controller, data in, data out
• An I/O instruction can immediately get/set a value in
controller’s register
Polling & interrupts
• When working with an I/O device, we need to determine
its state: is it ready/busy, did it encounter an error or
complete successfully?
• Polling = busy-wait cycle to wait for answer from device.
Periodically check status of operation 
• Interrupt – let the I/O device inform me
– Device sends signal along an interrupt-request line
– CPU detects signal and jumps to predefined interrupt handling
routine. (Need to save state while away) Figure 13.3
– Nature of signal allows us to choose appropriate handler
– Some interrupts maskable: can ignore
– What I/O interrupts do we encounter?
Direct memory access
• Used to large transfer of data
– E.g. reading contents of a file into memory
• DMA controller does I/O between device and memory
independent of and parallel with CPU execution
• Figure 13.5 example
– Process in CPU sends command to DMA controller identifying
source and destination locations.
– CPU goes about its business. DMA controller & device driver do
the rest, communicating with the disk controller.
– DMA tells disk to transfer a chunk of data to memory location X.
– Disk controller sends individual bytes to DMA controller
– DMA controller keeps track of progress. When done, interrupt
CPU to announce completion.
Application I/O interface
• In order to help the OS define appropriate system calls,
we need to know what devices can do for us
• Classify device personality. Such as:
– Character-stream or block?
– Sequential or random access desired?
– Synchronous or asynchronous, i.e. predictable or unpredictable
response times?
• Example devices (Figure 13.7)
–
–
–
–
Terminal is character-stream oriented
Disk is block oriented, and can both read & write data
Keyboard is asynchronous
Graphics card is write-only
• Question for next time: what use can we make of clock?
• I/O systems, continued
–
–
–
–
…Features of I/O system calls
Kernel responsibilities
Buffer, cache, spool
Performance issues
System call behavior
• Clocks
– Some I/O requests may be periodic, or set to occur at a specific
time
– Fine grain (cycle): look up the time
– Coarse grain: HW clock generates timer interrupts approx. every
1/60 of a second. Why so seldom?
• Blocking vs. nonblocking I/O
– Blocking: put yourself to sleep while waiting for completion.
More straightforward to code
– Nonblocking: you want to keep going while waiting. Response
time is important. Example?
• If it’s short and quick: have another thread get the data
• Usually: use asynchronous system call, and wait for I/O
interrupt or “event” to take place
Kernel’s job
• I/O scheduling
–
–
–
–
Critical task since inherently slow
2 goals: minimize average response time; fairness
Rearrange order of I/O requests as they enter “queue”
Ex. Using the elevator algorithm for disk access
• Error handling
– Transient failures occur: prepare to retry I/O calls
– I/O system calls can return an errno
• Protection
– We don’t want users to directly access I/O instructions.
– All I/O requests need to be checked by kernel
– Memory-mapped memory areas should be off limits to direct
user intervention. (unnecessary and invites bugs)
Buffers
• Memory area between device and application
temporarily holding data. Motivation…
• Different speeds of producer & consumer (Fig. 13.10)
– Would be nice to do one disk operation; wait until a whole disk
block can be written to, not just 1 line of text.
– Why do we use “double buffering”?
• Different size units
– Not everything is the same size as a disk block, page frame,
TCP packet, etc.
• Spool = buffer where output cannot be interleaved from
different sources: printing
– Create temporary “file” for each print job  print queue
– Managed by dedicated daemon process
Preparing HW ops
• Many steps, common example is reading file from disk
• Before we can communicate with disk controller, need to
locate file
– File system identifies the device containing the file (how?)
– Determine which disk blocks comprise the file (how?)
• Life cycle of I/O request begins!
–
–
–
–
–
–
Note that:
A device has a wait queue (why?)
Use DMA if the amount of data is large
Small data can be kept in a buffer
Lots of signalling/interrupts going on
End result: I/O system call returns some value to user process.
Let’s go through the steps 
Performance issues
• I/O incurs many interrupts
• Each interrupt causes a context switch – we’d like to
minimize these
• Ex. When logging in to a remote machine, don’t create a
network message for every keyboard interrupt
• Don’t copy data too many times unnecessarily
• Where to implement I/O: various levels:
– User space, kernel space, device driver
– Microcode on device controller or in the makeup of device
• Trends to observe among the levels (Fig. 13.16)
– Cost of mistake; efficiency; development cost; flexibility
CS 346 – Chapter 14
• Protection (Ch. 14)
– Users & processes want resources. Protection means
controlling their access.
– More than just RWX.
• Security (Ch. 15)
– Preserving integrity of system & its data
Background
• Protect from …
– Malicious, unauthorized or incompetent users
– Waste (e.g. accessing expensive equipment just because
cheaper resource is busy)
• Distinguish between: policy & mechanism
• Principle of least privilege
– Minimum damage in case of error
– Easier to identify who did what
– Create user accounts, and tailor privileges accordingly
• Bipartite relationship
– Processes vs. objects
– Ex. What files does a process have access to?
– More practical to organize privileges by user
Access control matrix
• Butler Lampson, 1969.
• Express our policies: how subjects (users/processes)
can use each object
– For each subject & each object, state the access rights
– Can be unwieldy in general!
• Protection domain
– Set of common access rights
– Usually correspond to a user or class of users
Ex. Students, faculty, guests, system administrators
– Process runs inside a domain determined by its owner
– Domains may coincidentally overlap (Figure 14.1)
Domains
• Representation as 2-D table
– Rows are the domains
– Columns are objects
– Entries in table specify access rights (Fig. 14.3)
• A user can only be in 1 protection domain at any given
time.
– Static: a user/process always operates in the same domain
(simple but inflexible)
– Dynamic: a user/process can switch to another domain
(complex but flexible)
Can represent this way: domains are objects that a user in some
domain can “switch” to. See Fig. 14.4.
• UNIX: some programs have setuid bit set to allow
domain switching.
Example
Domain
Resource 1
Admin
Execute
Students
Execute
Faculty
Owner
Execute
Resource 2
Resource 3
Write
Execute
Read
Copy
Execute
• In addition to read/write/execute, special powers 
• Copy: you can “copy” an access right for this object to
another domain.
• Owner: You can create/delete access rights for this
object
Implementation
• In theory, access control matrix is a huge table
– Logically it’s 3 dimensional (capability is 3rd dimension)
– Sparse: few rows, thousands of columns 
– Waste of virtual memory, I/O to look up this separate table
• Access list for objects
– Each object (file or other resource) will have attribute identifying
what can be done by members of each domain
– Can define a default to save space
• Capability list for domains
– List what I have access to, and what I can do with it
– We don’t want users to arbitrarily change their capabilities!
Capability information must be protected. How?
Some questions
• What should we do about objects that have no access
rights defined?
• How would we implement a policy limiting the number of
times a resource is accessed?
• How would we implement a policy allowing access only
during certain times of day?
CS 346 – Chapter 15
• Security
–
–
–
–
–
Physical, human, program
Authentication
Dictionary attack
Cryptography
Defense policies
Areas of security
Attackers look for every opportunity to get in
• Physical
– Restricting access: guards, locked doors
– Sounds simple, but don’t neglect!
• Human factors
– Naivete, laziness, dishonesty
– Help users pick good passwords, other recommended practices
– How to handle offenders or people with a history
• Program
– Correct algorithm, installation of software
– Used in the way originally intended
– Proper behavior vs. malicious code
Coding errors
• Not checking validation correctly
– A program to support a client remotely accessing the server
through commands
– Input command is scrutinized for safety: limited to “safe”
commands.
– But if we parse the command incorrectly, we may actually
perform unsafe operation unwittingly
• Synchronization problem
– mkdir could be executed in 2 steps: kernel creates new empty
subdirectory and assigns it to root. Then, ownership is
transferred to the user who executed mkdir.
– In between the 2 steps: If the system is busy, evil user can
execute a command to replace the new directory with a link to
some other existing file on the system.
Malicious code
• Trojan horse
– 2 purposes: one obvious & benign; the other hidden and evil
– Designed to appear like ordinary, beneficial program. “eat me”
• Root kit
– Trojans that replace system utility files
– Suppose you break into a system, and install programs that
allow you secret access. System admin can find evidence of
your intrusion, look at system logs of your files and work. What
can you do to cover your tracks?
• Trap door
– Flaw in a program placed there by designer. Bypasses security
checks under some circumstances. May originally have been
debugging mode.
– Ex. Special access code
Malicious (2)
• Virus
–
–
–
–
Fragment of code that spreads copies of itself to other programs
Requires a host program
Ex. May append/prepend its instructions to existing program
Every time program runs, virus code is executed, in order to
spread itself & perhaps do other “work”
• Virus scanning technique
– Read program code for “signature” of known viruses. In other
words, look for substring of code that is unique to the virus.
– But… virus may be polymorphic
– New viruses keep appearing
Malicious (3)
• Worm
– Like a virus, but it’s a stand-alone program that replicates itself
and spreads.
– Also can contain code to do other “work”
Example: Robert Morris, 1988
• Included a special module called the “grappling hook”
–
–
–
–
Install itself on remote system
Make network connection back to original system
Transfer rest of worm to new victim
Execute worm on victim
• Worm designed to exploit weaknesses in existing UNIX
utility programs
Morris exploits
• sendmail program
– Debug option: allowed an e-mail message to specify a program
as its recipient. This program would run, using e-mail message
body as its input.
– Worm created an e-mail message, containing grappling hook
code…. Instructions to remove mail headers…. Resulting
program passed to shell
• finger daemon
– Exploited buffer overflow by “fingering” a very long name. When
procedure called, it overwrote correct return address with
address of grappling hook code.
• 2 other exploits involved remote shell applications
– Attempted to crack passwords
• What happened to Morris himself?
Dictionary attack
• We can use a hash function to encode passwords
– No way to compute decoded value, so we don’t have to worry
about password table being compromised
• Attacker’s strategy
– Get the password table. Administrator complacently left it
unprotected.
– Compile a dictionary of thousands of common words; compute
the hash value of each.
– Look for matches between dictionary and values in password
table.
• Prepare for the threat
– Ask people to pick strange passwords, or force them to use a
predefined one… that’s hard to remember.
– Salt the password table
Salt
• A random string that is appended to a password before
being hashed.
• When user logs in, password is concatenated with salt
value, hashed, and checked against entry in password
table.
• Attacker must now expand dictionary to contain every
possible salt value with every possible password.
Cryptography
• Generally not feasible to build a totally secure network.
• Goal: secure communication over unsecure medium
– Key = secret information used to encode/decode message
– Recipient verifies the message it receives is from correct sender
– Sender wants to ensure only the recipient will understand msg
• Encryption algorithm: how to secure messages
– Encryption function: (plaintext, key)  ciphertext
– Decryption function: (ciphertext, key’)  plaintext
– Decryption secrecy is more critical than encryption.
• Types
– Symmetric: Use same key; decrypt analogous to encrypt
– Asymmetric: Different keys; breaking much more tedious
Examples
• Caesar cipher; substitution ciphers
– There are 26! ways in which letters can be reassigned.
– What is the “key”? Is this method secure?
• One-time pad (e.g. JN-25)
– Dictionary table: convert each word to a 5-digit number
– Additive table: add the next random number to each word
– Preface the message by indicating where in additive table you
are starting the encoding
– Tables may be periodically changed.
– Example: encryption code book.xlsx
• Data encryption standard
– Manipulate 64-bit chunks at a time, using XOR and shift
operators.
RSA
• Choose distinct 512-bit random primes p and q
• Let N = pq, and let M = (p – 1)(q – 1)
• Choose public encryption key e: a value less than and
relatively prime to M.
– Message is x. Sender transmits: y = xe mod N
• Choose private decryption key d: where ed mod M = 1
– e and N are public; outsider should have a tough time factoring
N to obtain p and q to determine d
– Recipient converts: z = yd mod N which should equal x.
• Example
p = 31, q = 41  N = 1271, M = 1200, e = 7, d = 343
x = 12  y = 127 mod 1271 = 1047; z = 1047343 mod 1271 = 12
Note: exponentiation should not be iterative multiplications
Example
• Choose secret primes p,q
• N = pq; M = (p – 1)(q – 1)
• Choose e < & relatively
prime to M.
• Message is x. Compute
and send y = xe mod N
• Pick private decrypt key d
where ed mod M = 1
• z = yd mod N, which
should equal x.
p = 31, q = 41
N = 1271, M = 1200
e=7
x = 12
y = 127 mod 1271 = 1047
d = 343
z = 1047343 mod 1271 = 12
It works!
Diffie - Hellman
• Method for 2 people to establish a private key 
• Choose values p (prime) and q
• Sender
– chooses secret value a, and computes A = qa mod p
– Sends A, p, q
– Eavesdropper cannot easily determine a
• Receiver
– Chooses secret value b
– Computes B = qb mod p and K = Ab mod p
– Sends B back to sender, who can compute K = Ba mod p
• Both methods of computing secret K are equivalent
– Ab mod p = (qa)b mod p
– Ba mod p = (qb)a mod p
Digital signature
• Used to authenticate origin of message
– Also useful if later sender denies ever sending the message
• Sender
– Computes hash value of message  128/160 bit result
– Applies D function (using private key)  “signature block”
– Appends signature block to the message to send
• Receiver
– Applies E function (using sender’s public key)  hash
– Computes hash value of message, see if there is a match.
• Efficient since E & D functions applied to small amount
of data. The message body itself might not be
confidential.
Doing security
• Defense in depth: don’t rely on just 1 catch-all method
• Some attackers know intimate details of your system and
how you operate
– Attackers may make some assumptions; surprises slow them
down
• Penetration test. Look for:
– Bad passwords
– Programs that look or behave abnormally
• Using setuid when not necessary
• In system directory when not necessary
• Too many daemons
– Unusual file permissions, search paths, modification dates
– Old versions of software
Intrusion detection
• What data do you want to collect?
• When is a real-time response required?
• What to scan:
– System calls, shell commands, network packets
• Possible responses
– Kill process
– Surreptitiously alerting admin
– Have honeypots ready for attacker
• How to detect
– Signature-based: look for specific string or behavior pattern
• Must know what to look for
– Anomalies from normal operating specifications
• But, what is normal?
Anomaly detection
• Establish accurate benchmarks of normal operation
– Ex. How often do we get pinged from China?
• False positive = false alarm: alert human, but no intrusion
• False negative = we missed an intrusion
• Deciding whether to alert human is critical, or else
people will perceive a lot of false alarms exist
• Example
– 20 out of 1,000,000 records show intrusion
– System detects/alerts 80% of these intrusion events
• 16 records revealed, 4 ignored
– System falsely identifies 0.01% of normal events as an intrusion
• 0.01% of 999,980 = ~ 100 false alarms
– From human point of view, 100/116 = 86% alarms are false
Download