UNIX INTERNALS

advertisement
UNIX Tools
G22.2245-001, Fall 2000
Danielle S. Lahmani
email: lahmani@cs.nyu.edu
Lecture 13
2000 Copyrights, Danielle S.
Lahmani
UNIX Internals: Motivations
• Knowledge of UNIX Internals helps in:
– understanding similar systems (for example,
NT, LINUX)
– designing high performance UNIX applications
2000 Copyrights, Danielle S.
Lahmani
WHAT IS THE KERNEL?
• Part of UNIX OS that contains code for:
– controlling execution of processes (creation,
termination, suspension, communication)
– scheduling processes fairly for execution on the
CPU.
– allocating main memory for exec of processes.
– allocating secondary memory for efficient
storage and retrieval of user data.
– Handling peripherals such as terminals, tape
drives, disk drives and network devices.
2000 Copyrights, Danielle S.
Lahmani
Kernel Characteristics:
• Kernel loaded into memory and runs until
the system is turned off or crashes.
• Mostly written in C with some assembly
language written for efficiency reasons.
• User programs make use of kernel services
via the system call interface.
• Provides its services transparently.
2000 Copyrights, Danielle S.
Lahmani
Kernel Subsystems
• File system
– Directory hierarchy, regular files, peripherals
– Multiple file systems
• Process management
– How processes share CPU, memory and signals
• Input/Output
– How processes access files, terminal I/O
• Interprocess Communication
• Memory management
System V and BSD have different implementations of
different subsystems2000
.
Copyrights, Danielle S.
Lahmani
TALKING TO THE KERNEL
• Processes accesses kernel facilities via
system calls
• Peripherals communicate with the kernel
via hardware interrupts.
2000 Copyrights, Danielle S.
Lahmani
EXECUTION IN USER MODE AND
KERNEL MODE
• Kernel contains several data structures
needed for implementing kernel services.
These structures include:
• Process table: contains an entry for every
process in the system
• Open-file table, contains at least one entry
for every open file in the system.
2000 Copyrights, Danielle S.
Lahmani
Execution in kernel mode and
user mode
• When a process executes a system call, the
execution mode of the process changes
from user mode to kernel mode.
– Processes in user mode can access their own
instructions and data but not kernel instructions
and data structures.
– In kernel mode, a process can access system
data structures, such as the process table.
2000 Copyrights, Danielle S.
Lahmani
Flow of Control during a
System call
• User process invokes a system call (for
example open( ))
• Every system call is allocated a code
number at system initialization.
– C runtime library version of the system call
places the system call parameter and the system
call code number into machine registers and
then executes a trap machine instruction
switching to kernel code and kernel mode.
2000 Copyrights, Danielle S.
Lahmani
Flow of control of a system call
• trap instruction uses the system call number as in
index into a system call vector table (located in
kernel memory) which is an array of pointers to
the kernel code for each system call.
• Code corresponding to system call executes in
kernel mode, modifying kernel data structures if
necessary.
• Performs special "return" instruction that flips
machine back into user mode and returns to the
user process's code
2000 Copyrights, Danielle S.
Lahmani
SYNCHRONOUS VS ASYNCHRONOUS
PROCESSING
• Usually, processes performing system calls cannot
be preempted.
• Processes must relinquish voluntarily the CPU for
example while waiting for I/O to complete.
• Kernel sends a process to sleep and will wake it up
when I/O is completed.
• The scheduler does not allocate sleeping process
any CPU time and will allocate the CPU to other
processes while the hardware device is servicing
the I/O request.
2000 Copyrights, Danielle S.
Lahmani
INTERRUPTS AND EXCEPTIONS
• UNIX system allows devices such as I/O
peripherals and clock to interrupt CPU
asynchronously.
• On receipt of the interrupt, kernel saves its current
context (frozen image of what the process was
doing), determines cause of interrupt and services
the interrupt.
• Devices are allocated an interrupt priority based in
their relative importance.
• When the kernel services an interrupt, it blocks
out lower priority interrupts but services higher
priority interrupts2000 Copyrights, Danielle S.
Lahmani
PROCESSOR EXECUTION LEVELS
• Kernel must sometimes prevent the occurrence of
interrupts during critical activity to avoid
corruption of data.
• Typical Interrupt Levels
• Machine Errors
• Clock
• Higher priority
• Disk
• Network Devices
• Terminals
• Software Interrupts
Lower priority
2000 Copyrights, Danielle S.
Lahmani
Interrupts
• Interrupts are serviced by kernel interrupt
handlers which must be very fast to avoid
loosing any interrupts.
• If an interrupt of higher priority occurs
while a lower interrupt is services, nesting
will occur and higher interrupt is serviced.
2000 Copyrights, Danielle S.
Lahmani
DISK ARCHITECTURE:
• Disk is split in two ways: sliced like a pizza called
sectors
• And subdivided into concentric rings called
tracks.
• Blocks are are individual areas bounded by the
intersection of sectors and tracks; they are the
basic units of disk storage.
• Typical blocks can hold 4K bytes.
2000 Copyrights, Danielle S.
Lahmani
Disk architecture (cont’)
• Several variations of disk architecture: many disks
contains several platters, stacked one upon the
other. In these systems, a collection of tracks with
the same index number is called a cylinder.
• Big issue: sequential reads are much faster than
random ones (factor of 10 to 15)
• When a sequence of contiguous blocks is read,
there is a latency delay between each block due to
latency of the communication between the disk
controller and the2000
device
Copyrights,driver.
Danielle S.
Lahmani
Disk architecture (con’t)
• Want consecutive data to be on the same track
though not consecutive on the track. See
interleaving techniques wherein
• Consecutive blocks are three sectors apart.
• Extent file systems support large consecutive
chunks at once.
• (needed for data intensive applications)
• I/O is always done in terms of blocks
2000 Copyrights, Danielle S.
Lahmani
THE FILE SUBSYSTEM
• Support of
• Regular files
• Directory
• Special files correspond to peripherals such
as tapes, terminals or disks and interprocess communication mechanisms such
as pipes and sockets.
2000 Copyrights, Danielle S.
Lahmani
INODES
• Contains permissions, owner, groups and last
modification times.
• Type of file: regular, directory or special file
• If it is symbolic link, the value of the symbolic
link.
• If it is a regular file or directory, contains location
of its disks blocks:
2000 Copyrights, Danielle S.
Lahmani
Inode (con’t)
• Direct pointers to block 0 to 9
• Indirect pointer to an entire block which
holds 10 .. 1033 blocks.
• Double indirect pointer (in primary inode)
to a block that is just pointers to other
blocks, each of which holds 1024 pointers
to data blocks.
2000 Copyrights, Danielle S.
Lahmani
LAYOUT OF THE FILE SYSTEM
• File system has following structure:
• First logical block :boot block for starting OS.
• Second logical block: superblock that contains
information about free pages and inode list.
• Following is the inode list which is a list of
inodes. Administrators specify size of inode list
when configuring the file system. Kernel
references inodes by index into the inode list.
• The data blocks start at the end of the inode list
and contain file data and administrative data.
2000 Copyrights, Danielle S.
Lahmani
CONVERSION OF PATHNAME TO AN
INODE
• Initial access to a file is through its
pathname. The kernel needs to translate a
pathname to inodes to access files.
• The algorithm namei parses the pathname
one component at a time, converting each
component into an inode based on its name
and the directory being searched and
eventually returns the inode of the input
path name.
2000 Copyrights, Danielle S.
Lahmani
Namei ALGORITHM
– if pathname is absolute, then search starts from
the root inode
– if pathname is relative, search is started from
the inode corresponding to the current working
directory of the process.( kept in the process u
area)
– the components of the pathname are then
processed from left to right. Every component,
except the last one, should either be a directory
or a symbolic link. Let's call the intermediate
inodes the working inodes.
2000 Copyrights, Danielle S.
Lahmani
Namei algorithm (cont’)
– If the working inode is a directory, the current
pathname component is looked for in the
directory corresponding to the working inode.
If it is not found, it returns an error, otherwise,
the value of the working inode number
becomes the inode number associated with the
located pathname component.
2000 Copyrights, Danielle S.
Lahmani
Namei (cont’)
– If the working inode corresponds to a symbolic
link, the pathname up to and including the
current path component is replaced by the
contents of the symbolic link, and the pathname
is reprocessed.
– The inode corresponding to the final pathname
component is the inode of the file referenced by
the entire pathname
2000 Copyrights, Danielle S.
Lahmani
MOUNTING FILE SYSTEMS
• When UNIX is started, the directory
hierarchy corresponds to the file system
located on a single disk called the root
device.
• The mount utility allows a super-user to
splice the root directory of a file system into
the existing directory hierarchy.
• File systems created on other devices can be
attached to the original directory hierarchy
using the mount mechanism.
2000 Copyrights, Danielle S.
Lahmani
MOUNT(CONT')
• When mount is established, users are
unaware of crossing mount points.
• File system may be detached from the main
hierarchy using the umount utility.
• Links do not work across mounts (System
V)
• Example:
• $ mount /dev/floppy /mtn
• $ umount /mtn
2000 Copyrights, Danielle S.
Lahmani
Mount (cont’)
• Kernel maintains a system-wide data
structure called the mount table that allows
muliple file systems to be accessed via a
single directory hierarchy.
• mount( ) and umount( ) system calls modify
table, in the following manner:
2000 Copyrights, Danielle S.
Lahmani
MOUNT (CONT')
• with mount( ), an entry is added with:
• device number containing file system
• a pointer to the root inode of the newly
mounted file system
• a pointer to the inode of the mount point
• a pointer to the filesystem-specific mount
data structure of the newly mounted file
system.
2000 Copyrights, Danielle S.
Lahmani
Umount ()
• With umount() several checks are made in
the kernel:
– checks that there are no open files in the file
system to be unmounted
– flushes the superblock and buffered inodes back
to the file system
– removes mount table entry and removes "mount
point" mark from the mount point directory
2000 Copyrights, Danielle S.
Lahmani
THE PROCESS SUBSYSTEM:
process states
• Every process on the system can be in one
of 6 states:
– running: process is currently using the CPU
– runnable: ready to run, will run depending on
priority
– sleeping: waiting for an event
– suspended: (e.g., as a result of ctrl Z)
– idle: being created by fork( ), not yet runnable
– zombie: terminated but parent has not accept
its return value 2000 Copyrights, Danielle S.
Lahmani
Example of process state
• For example, when process issues an I/O
command, it becomes suspended, then
becomes runnable again when I/O
completes and will run depending on
priority.
2000 Copyrights, Danielle S.
Lahmani
PROCESS COMPOSITION
• code area: executable (text) portion of the
process
• data area: used by the process to contain
static data
• stack area: used by the process to store
temporary data
• user area: holds housekeeping info
• page tables: used
for memory management
2000 Copyrights, Danielle S.
Lahmani
USER AREA
• Every process has a private user area for
housekeeping information that is used by
the kernel for process management.
• It contains control and status information.
• The contents of the user area are only
accessible when the process is executing in
kernel space.
• The kernel can only access the user area of
the currently running process, and not the
user area of other processes.
2000 Copyrights, Danielle S.
Lahmani
PROCESS USER AREA (CONT')
• The important fields in the user area include:
– a pointer to the process table slot of the
currently executing process
– file descriptors for all open files
– internal I/O parameters
– current directory and current root
– process and file size limits
– real and effective user Ids
– an array indicating how a process reacts to
signals
– how much CPU2000time
process
Copyrights,
Danielle S.has recently used
Lahmani
PROCESS TABLE
• The process table is a kernel data
structure that contains one entry for
every process in the system.
• The process table contains fields that
must always be accessible to the kernel.
2000 Copyrights, Danielle S.
Lahmani
Process entry info
– state: (running, runnable, sleeping, suspended,
idle or zombified)
– process ID and Parent PID
– its real and effective user ID and group ID
(GID)
– location of its code, data, stack and user areas
– a list of all pending signals
– various timers give process execution time and
kernel resource2000utilization
Copyrights, Danielle S.
Lahmani
THE SCHEDULER
• The scheduler is responsible for sharing
CPU time between competing processes.
• The scheduler maintains a multilevel
priority queue that allows it to schedule
processes efficiently and follows a specific
algorithm for selecting which process
should be running.
2000 Copyrights, Danielle S.
Lahmani
Scheduling rules:
• The kernel allocates the CPU to a process
for a time quantum, preempts a process
that exceeds its time quantum and feeds
it back into one of the several priority
queues.
• During every second, processes in the
non-empty queue of the highest priority
queue are allocated the CPU is a roundrobin fashion.
2000 Copyrights, Danielle S.
Lahmani
Scheduler (cont’)
• To support real-time processes, scheduler
needs to be changed so
• that scheduling is based on priority
inheritance rather than time quanta.
Also, more preemption points in the
kernel are needed.
2000 Copyrights, Danielle S.
Lahmani
Context Switch:
• To switch from one process to another,
the kernel saves the process's program
counter, stack pointer and other
important info in the process's user area.
• When the process is ready to run, the
kernel will get this info from the process's
user area.
2000 Copyrights, Danielle S.
Lahmani
Loading an executable
• A user compiles the source code of a
program to create an executable file,
which consists of several parts:
– Set of "headers" that describe the attributes of a
file
– Program text
– Machine language representation
– Other sections, such as symbol table
information
2000 Copyrights, Danielle S.
Lahmani
Loading an executable
• Kernel loads an executable file into memory
during an exec( ) system call.
• Loaded process contains at least 3 parts, called
regions
– Text corresponds to text sections of the
executable file
– Data corresponds to data section of the
executable file
– Stack is automatically created and its size is
dynamically adjusted by the kernel at run time.
2000 Copyrights, Danielle S.
Lahmani
Loading an executable
• Compiler generates address for a virtual
address space with a given address range.
• Memory Management Unit translates
virtual addresses generated by the
compiler into addresses of physical
memory.{/* child process does command
*/
2000 Copyrights, Danielle S.
Lahmani
THE BOOT and INIT PROCESS
• administrator initializes system through bootstrap
sequence
• UNIX system, bootstrap sequence eventually
reads the boot block (boot 0) of a disk and loads
into memory
• The program contained in the boot block loads the
kernel from the File system (for example, /unix)
• After kernel loaded into memory, boot program
transfers control to the start address of the kernel
and the kernel starts running.
2000 Copyrights, Danielle S.
Lahmani
Boot process (cont’)
• After initialization, kernel mounts root file
system and handcrafts environment for
process 0.
• Process 0 forks() from within kernel.
• Process 1, running in kernel mode, creates
its user-level context by allocating a data
region and attaching to its address space.
2000 Copyrights, Danielle S.
Lahmani
Boot process (cont’)
• Process 1 copies code from kernel space to
new regions which forms new user-context
of process 1.
• Process 1 sets up saved user registers
contexts, "returns" from kernel mode and
executes code just copied from kernel.
• Process 1 is now a user-level process and
the text code consists of a call to exec the
/etc/init program.
2000 Copyrights, Danielle S.
Lahmani
Download