UNIX Tools G22.2245-001, Fall 2000 Danielle S. Lahmani email: lahmani@cs.nyu.edu Lecture 13 2000 Copyrights, Danielle S. Lahmani UNIX Internals: Motivations • Knowledge of UNIX Internals helps in: – understanding similar systems (for example, NT, LINUX) – designing high performance UNIX applications 2000 Copyrights, Danielle S. Lahmani WHAT IS THE KERNEL? • Part of UNIX OS that contains code for: – controlling execution of processes (creation, termination, suspension, communication) – scheduling processes fairly for execution on the CPU. – allocating main memory for exec of processes. – allocating secondary memory for efficient storage and retrieval of user data. – Handling peripherals such as terminals, tape drives, disk drives and network devices. 2000 Copyrights, Danielle S. Lahmani Kernel Characteristics: • Kernel loaded into memory and runs until the system is turned off or crashes. • Mostly written in C with some assembly language written for efficiency reasons. • User programs make use of kernel services via the system call interface. • Provides its services transparently. 2000 Copyrights, Danielle S. Lahmani Kernel Subsystems • File system – Directory hierarchy, regular files, peripherals – Multiple file systems • Process management – How processes share CPU, memory and signals • Input/Output – How processes access files, terminal I/O • Interprocess Communication • Memory management System V and BSD have different implementations of different subsystems2000 . Copyrights, Danielle S. Lahmani TALKING TO THE KERNEL • Processes accesses kernel facilities via system calls • Peripherals communicate with the kernel via hardware interrupts. 2000 Copyrights, Danielle S. Lahmani EXECUTION IN USER MODE AND KERNEL MODE • Kernel contains several data structures needed for implementing kernel services. These structures include: • Process table: contains an entry for every process in the system • Open-file table, contains at least one entry for every open file in the system. 2000 Copyrights, Danielle S. Lahmani Execution in kernel mode and user mode • When a process executes a system call, the execution mode of the process changes from user mode to kernel mode. – Processes in user mode can access their own instructions and data but not kernel instructions and data structures. – In kernel mode, a process can access system data structures, such as the process table. 2000 Copyrights, Danielle S. Lahmani Flow of Control during a System call • User process invokes a system call (for example open( )) • Every system call is allocated a code number at system initialization. – C runtime library version of the system call places the system call parameter and the system call code number into machine registers and then executes a trap machine instruction switching to kernel code and kernel mode. 2000 Copyrights, Danielle S. Lahmani Flow of control of a system call • trap instruction uses the system call number as in index into a system call vector table (located in kernel memory) which is an array of pointers to the kernel code for each system call. • Code corresponding to system call executes in kernel mode, modifying kernel data structures if necessary. • Performs special "return" instruction that flips machine back into user mode and returns to the user process's code 2000 Copyrights, Danielle S. Lahmani SYNCHRONOUS VS ASYNCHRONOUS PROCESSING • Usually, processes performing system calls cannot be preempted. • Processes must relinquish voluntarily the CPU for example while waiting for I/O to complete. • Kernel sends a process to sleep and will wake it up when I/O is completed. • The scheduler does not allocate sleeping process any CPU time and will allocate the CPU to other processes while the hardware device is servicing the I/O request. 2000 Copyrights, Danielle S. Lahmani INTERRUPTS AND EXCEPTIONS • UNIX system allows devices such as I/O peripherals and clock to interrupt CPU asynchronously. • On receipt of the interrupt, kernel saves its current context (frozen image of what the process was doing), determines cause of interrupt and services the interrupt. • Devices are allocated an interrupt priority based in their relative importance. • When the kernel services an interrupt, it blocks out lower priority interrupts but services higher priority interrupts2000 Copyrights, Danielle S. Lahmani PROCESSOR EXECUTION LEVELS • Kernel must sometimes prevent the occurrence of interrupts during critical activity to avoid corruption of data. • Typical Interrupt Levels • Machine Errors • Clock • Higher priority • Disk • Network Devices • Terminals • Software Interrupts Lower priority 2000 Copyrights, Danielle S. Lahmani Interrupts • Interrupts are serviced by kernel interrupt handlers which must be very fast to avoid loosing any interrupts. • If an interrupt of higher priority occurs while a lower interrupt is services, nesting will occur and higher interrupt is serviced. 2000 Copyrights, Danielle S. Lahmani DISK ARCHITECTURE: • Disk is split in two ways: sliced like a pizza called sectors • And subdivided into concentric rings called tracks. • Blocks are are individual areas bounded by the intersection of sectors and tracks; they are the basic units of disk storage. • Typical blocks can hold 4K bytes. 2000 Copyrights, Danielle S. Lahmani Disk architecture (cont’) • Several variations of disk architecture: many disks contains several platters, stacked one upon the other. In these systems, a collection of tracks with the same index number is called a cylinder. • Big issue: sequential reads are much faster than random ones (factor of 10 to 15) • When a sequence of contiguous blocks is read, there is a latency delay between each block due to latency of the communication between the disk controller and the2000 device Copyrights,driver. Danielle S. Lahmani Disk architecture (con’t) • Want consecutive data to be on the same track though not consecutive on the track. See interleaving techniques wherein • Consecutive blocks are three sectors apart. • Extent file systems support large consecutive chunks at once. • (needed for data intensive applications) • I/O is always done in terms of blocks 2000 Copyrights, Danielle S. Lahmani THE FILE SUBSYSTEM • Support of • Regular files • Directory • Special files correspond to peripherals such as tapes, terminals or disks and interprocess communication mechanisms such as pipes and sockets. 2000 Copyrights, Danielle S. Lahmani INODES • Contains permissions, owner, groups and last modification times. • Type of file: regular, directory or special file • If it is symbolic link, the value of the symbolic link. • If it is a regular file or directory, contains location of its disks blocks: 2000 Copyrights, Danielle S. Lahmani Inode (con’t) • Direct pointers to block 0 to 9 • Indirect pointer to an entire block which holds 10 .. 1033 blocks. • Double indirect pointer (in primary inode) to a block that is just pointers to other blocks, each of which holds 1024 pointers to data blocks. 2000 Copyrights, Danielle S. Lahmani LAYOUT OF THE FILE SYSTEM • File system has following structure: • First logical block :boot block for starting OS. • Second logical block: superblock that contains information about free pages and inode list. • Following is the inode list which is a list of inodes. Administrators specify size of inode list when configuring the file system. Kernel references inodes by index into the inode list. • The data blocks start at the end of the inode list and contain file data and administrative data. 2000 Copyrights, Danielle S. Lahmani CONVERSION OF PATHNAME TO AN INODE • Initial access to a file is through its pathname. The kernel needs to translate a pathname to inodes to access files. • The algorithm namei parses the pathname one component at a time, converting each component into an inode based on its name and the directory being searched and eventually returns the inode of the input path name. 2000 Copyrights, Danielle S. Lahmani Namei ALGORITHM – if pathname is absolute, then search starts from the root inode – if pathname is relative, search is started from the inode corresponding to the current working directory of the process.( kept in the process u area) – the components of the pathname are then processed from left to right. Every component, except the last one, should either be a directory or a symbolic link. Let's call the intermediate inodes the working inodes. 2000 Copyrights, Danielle S. Lahmani Namei algorithm (cont’) – If the working inode is a directory, the current pathname component is looked for in the directory corresponding to the working inode. If it is not found, it returns an error, otherwise, the value of the working inode number becomes the inode number associated with the located pathname component. 2000 Copyrights, Danielle S. Lahmani Namei (cont’) – If the working inode corresponds to a symbolic link, the pathname up to and including the current path component is replaced by the contents of the symbolic link, and the pathname is reprocessed. – The inode corresponding to the final pathname component is the inode of the file referenced by the entire pathname 2000 Copyrights, Danielle S. Lahmani MOUNTING FILE SYSTEMS • When UNIX is started, the directory hierarchy corresponds to the file system located on a single disk called the root device. • The mount utility allows a super-user to splice the root directory of a file system into the existing directory hierarchy. • File systems created on other devices can be attached to the original directory hierarchy using the mount mechanism. 2000 Copyrights, Danielle S. Lahmani MOUNT(CONT') • When mount is established, users are unaware of crossing mount points. • File system may be detached from the main hierarchy using the umount utility. • Links do not work across mounts (System V) • Example: • $ mount /dev/floppy /mtn • $ umount /mtn 2000 Copyrights, Danielle S. Lahmani Mount (cont’) • Kernel maintains a system-wide data structure called the mount table that allows muliple file systems to be accessed via a single directory hierarchy. • mount( ) and umount( ) system calls modify table, in the following manner: 2000 Copyrights, Danielle S. Lahmani MOUNT (CONT') • with mount( ), an entry is added with: • device number containing file system • a pointer to the root inode of the newly mounted file system • a pointer to the inode of the mount point • a pointer to the filesystem-specific mount data structure of the newly mounted file system. 2000 Copyrights, Danielle S. Lahmani Umount () • With umount() several checks are made in the kernel: – checks that there are no open files in the file system to be unmounted – flushes the superblock and buffered inodes back to the file system – removes mount table entry and removes "mount point" mark from the mount point directory 2000 Copyrights, Danielle S. Lahmani THE PROCESS SUBSYSTEM: process states • Every process on the system can be in one of 6 states: – running: process is currently using the CPU – runnable: ready to run, will run depending on priority – sleeping: waiting for an event – suspended: (e.g., as a result of ctrl Z) – idle: being created by fork( ), not yet runnable – zombie: terminated but parent has not accept its return value 2000 Copyrights, Danielle S. Lahmani Example of process state • For example, when process issues an I/O command, it becomes suspended, then becomes runnable again when I/O completes and will run depending on priority. 2000 Copyrights, Danielle S. Lahmani PROCESS COMPOSITION • code area: executable (text) portion of the process • data area: used by the process to contain static data • stack area: used by the process to store temporary data • user area: holds housekeeping info • page tables: used for memory management 2000 Copyrights, Danielle S. Lahmani USER AREA • Every process has a private user area for housekeeping information that is used by the kernel for process management. • It contains control and status information. • The contents of the user area are only accessible when the process is executing in kernel space. • The kernel can only access the user area of the currently running process, and not the user area of other processes. 2000 Copyrights, Danielle S. Lahmani PROCESS USER AREA (CONT') • The important fields in the user area include: – a pointer to the process table slot of the currently executing process – file descriptors for all open files – internal I/O parameters – current directory and current root – process and file size limits – real and effective user Ids – an array indicating how a process reacts to signals – how much CPU2000time process Copyrights, Danielle S.has recently used Lahmani PROCESS TABLE • The process table is a kernel data structure that contains one entry for every process in the system. • The process table contains fields that must always be accessible to the kernel. 2000 Copyrights, Danielle S. Lahmani Process entry info – state: (running, runnable, sleeping, suspended, idle or zombified) – process ID and Parent PID – its real and effective user ID and group ID (GID) – location of its code, data, stack and user areas – a list of all pending signals – various timers give process execution time and kernel resource2000utilization Copyrights, Danielle S. Lahmani THE SCHEDULER • The scheduler is responsible for sharing CPU time between competing processes. • The scheduler maintains a multilevel priority queue that allows it to schedule processes efficiently and follows a specific algorithm for selecting which process should be running. 2000 Copyrights, Danielle S. Lahmani Scheduling rules: • The kernel allocates the CPU to a process for a time quantum, preempts a process that exceeds its time quantum and feeds it back into one of the several priority queues. • During every second, processes in the non-empty queue of the highest priority queue are allocated the CPU is a roundrobin fashion. 2000 Copyrights, Danielle S. Lahmani Scheduler (cont’) • To support real-time processes, scheduler needs to be changed so • that scheduling is based on priority inheritance rather than time quanta. Also, more preemption points in the kernel are needed. 2000 Copyrights, Danielle S. Lahmani Context Switch: • To switch from one process to another, the kernel saves the process's program counter, stack pointer and other important info in the process's user area. • When the process is ready to run, the kernel will get this info from the process's user area. 2000 Copyrights, Danielle S. Lahmani Loading an executable • A user compiles the source code of a program to create an executable file, which consists of several parts: – Set of "headers" that describe the attributes of a file – Program text – Machine language representation – Other sections, such as symbol table information 2000 Copyrights, Danielle S. Lahmani Loading an executable • Kernel loads an executable file into memory during an exec( ) system call. • Loaded process contains at least 3 parts, called regions – Text corresponds to text sections of the executable file – Data corresponds to data section of the executable file – Stack is automatically created and its size is dynamically adjusted by the kernel at run time. 2000 Copyrights, Danielle S. Lahmani Loading an executable • Compiler generates address for a virtual address space with a given address range. • Memory Management Unit translates virtual addresses generated by the compiler into addresses of physical memory.{/* child process does command */ 2000 Copyrights, Danielle S. Lahmani THE BOOT and INIT PROCESS • administrator initializes system through bootstrap sequence • UNIX system, bootstrap sequence eventually reads the boot block (boot 0) of a disk and loads into memory • The program contained in the boot block loads the kernel from the File system (for example, /unix) • After kernel loaded into memory, boot program transfers control to the start address of the kernel and the kernel starts running. 2000 Copyrights, Danielle S. Lahmani Boot process (cont’) • After initialization, kernel mounts root file system and handcrafts environment for process 0. • Process 0 forks() from within kernel. • Process 1, running in kernel mode, creates its user-level context by allocating a data region and attaching to its address space. 2000 Copyrights, Danielle S. Lahmani Boot process (cont’) • Process 1 copies code from kernel space to new regions which forms new user-context of process 1. • Process 1 sets up saved user registers contexts, "returns" from kernel mode and executes code just copied from kernel. • Process 1 is now a user-level process and the text code consists of a call to exec the /etc/init program. 2000 Copyrights, Danielle S. Lahmani