W4118 Operating Systems Instructor: Junfeng Yang Logistics Homework 2 out: system call fault injector We’ll use VM for kernel programming assignments Three ways to get the class VM System call can fail for a variety of reasons Many programs must correctly these failures Our fault injector can test this How? You add a system call fail(), to fail one of the future system calls a process issues Download from the course website Go to office hours and copy from me or TAs We’ll make a few DVDs Who’re looking for teammates? Last lecture Process: a good way to manage concurrent activities Address space Mechanism: Process dispatching • Policy: process scheduling • Dispatcher gains control via periodic timer interrupt • Dispatcher saves process state to PCB on context switch • Dispatcher maintains scheduling queues of processes Common process operations • Process creation Today Processes (cont.) Process termination Interprocess communication Processes in Linux Where is the relevant code task_struct Context Switch in Linux. switch stack. Process Termination Process executes last statement and asks the operating system to delete it (exit(int status)). In exit(): OS notifies parent process child exit status • Parent gets this status via wait(int* stat_loc) • 0: success, non-zero: failure OS deallocates child’s resources Processes can be terminated by other processes E.g. Parent may terminate execution of children processes • Child has exceeded allocated resources • Task assigned to child is no longer required • If parent is exiting – Some operating system do not allow child to continue if its parent terminates: All children terminated - cascading termination Notes on UNIX Process Termination What if child exits before parent? Child process becomes a zombie process Parent must call wait() to “reap” child. OS will notify parent about child’s termination What if parent exits before child? Orphaned processes Re-parented to process 1, the init process while (1) { write (1, "$ “, 2); parse_cmd (command, args); // parse user input switch(pid = fork ()) { case -1: perror (“fork”); break; case 0: // child execv (command, args, 0); break; default: // parent wait (0); break; // wait for child to terminate } } Today Processes (cont.) Process termination Interprocess communication Processes in Linux Relevant files Data structures Context switch implementation Cooperating Processes Independent process cannot affect or be affected by the execution of another process. Cooperating process can affect or be affected by the execution of another process Advantages of process cooperation Information sharing Computation speed-up Modularity/Convenience Interprocess Communication Models Message Passing Shared Memory Message Passing v.s. Shared Memory Message passing Why good? Simpler. All sharing is explicit Why bad? Overhead. Data copying, cross protection domains Shared Memory Why good? Performance. Set up shared memory once, then access w/o crossing protection domains Why bad? Synchronization IPC Example: Unix signals Signals A very short message: just a small integer A fixed set of available signals. Examples: • 2: SIGINT, sent (usuallly) when you press ctrl+C • 9: SIGKILL, to kill a process • 11: SIGSEGV, sent when there is a memory error Send a signal to a process kill(pid_t pid, int sig) Signal can be sent by users, kernel, or other processes What to do when receiving a signal? Installing a handler for a signal sighandler_t signal(int signum, sighandler_t handler); IPC Example: Unix pipe int pipe(int fd[2]); Returns two file descriptors in fd[0] and fd[1]; Writes to fd[1] will be read on fd[0] When last copy of fd[1] closed, fd[0] will return EOF Return 0 on success, -1 on error Operations on pipes: read/write/close --- as with files When fd[1] closed, read(fd[0]) returns 0 bytes When fd[0] closed, write(fd[1]): • Kills process with SIGPIPE, or if blocked • Failes with EPIPE IPC Example: Unix pipe (cont.) int pipefd[2]; pipe(pipefd); switch(pid=fork()) { case -1: perror("fork"); exit(1); case 0: close(pipefd[0]); // write to fd[1] break; default: close(pipefd[1]); // read from fd[0] break; } IPC Example: Unix Shared Memory int shmget(key_t key, size_t size, int shmflg); void* shmat(int shmid, const void *addr, int flg) Create a shared memory segment, and return its id key: unique identifier of a shared memory segment, or IPC_PRIVATE (means create a new shared mem seg) Attach shared memory segment to address space of the calling process. Return a pointer to shared memory shmid: id returned by shmget() int shmdt(const void *shmaddr); Detach from shared memory IPC Example: Unix Shared Memory (cont.) int id = shmget(IPC_PRIVATE, sizeof(int), IPC_CREAT | 0666); int *x = (int*)shmat(id, NULL, 0); *x = 0; switch(pid=fork()) { case -1: perror("fork"); exit(1); case 0: while(1) { ++*x; sleep(1); } default: while(1) { printf(“x = %d\n”, *x); sleep(1); } } Problem: synchronization! (later) Today Processes (cont.) Process termination Interprocess communication Processes in Linux Process data structures Process operations: fork() and exit() Context switch implementation Find process info: /proc/<pid> ps to get process id For each process, there is a corresponding directory /proc/<pid> to store this process information in the /proc pseudo file system Process-related files Header files include/linux/sched.h – declarations for most task data structures include/linux/wait.h – declarations for wait queues include/asm-i386/system.h – architecture-dependent declarations Source files kernel/sched.c – task scheduling routines kernel/signal.c – signal handling routines kernel/fork.c – process/thread creation routines kernel/exit.c – process exit routines fs/exec.c – executing program arch/i386/kernel/entry.S – kernel entry points arch/i386/kernel/process.c – architecture-dependent process routines http://lxr.linux.no/ Linux: Processes or Threads? Linux uses a neutral term: tasks Tasks represent both processes and threads Threads = tasks that share AS data structures When processes trap into the kernel, they share the Linux kernel’s address space kernel threads Task data structure task_struct: process control block kernel stack: work space for systems calls (the kernel executes on the user process’s behalf) or interrupt handlers Process Control Block in Linux task_struct (process descriptor in ULK) include/linux/sched.h Each task has a unique task_struct http://lxr.linux.no/linux+v2.6.11/ Task States: state TASK_RUNNING – the thread is running on the CPU or is waiting to run TASK_INTERRUPTIBLE – the thread is sleeping and can be awoken by a signal (EINTR) TASK_UNINTERRUPTIBLE – the thread is sleeping and cannot be awakened by a signal TASK_STOPPED – the process has been stopped by a signal or by a debugger TASK_TRACED – the process is being traced via the ptrace system call include/linux/sched.h Exit States EXIT_ZOMBIE – the process is exiting but has not yet been waited for by its parent EXIT_DEAD – the process has exited and has been waited for Process IDs process ID: pid thread group ID: tgid pid of first thread in process getpid() returns this ID, so all threads in a process share the same process ID many system calls identify a process by its PID Linux kernel uses pidhash to efficiently find processes by pids (see include/linux/pid.h, kernel/pid.c) Other PCB data structures user: user_struct – per-user information (for example, number of current processes) mm, active_mm: mm_struct – memory areas for the process (address space) fs: fs_struct – current and root directories associated with the process files: files_struct – file descriptors for the process signal: signal_struct – signal structures associated with the process Process Relationships Processes are related: children, sibling Parent/child (fork()), siblings Possible to "re-parent" • Parent vs. original parent Process groups: signal_struct->pgrp Parent can "wait" for child to terminate Possible to send signals to all members Sessions: signal_struct->session Processes related to login How Linux manages processes In order for Linux to efficiently manage the scheduling of its various ‘tasks’, separate queues are maintained for ‘running’ tasks and for tasks that temporarily are ‘blocked’ while waiting for a particular event to occur (such as the arrival of new data from the keyboard, or the exhaustion of prior data sent to the printer) These queues are implemented using doublylinked list (struct list_head in include/linux/list.h) Some tasks are ‘ready-to-run’ init_task list run_queue Those tasks that are ready-to-run comprise a sub-list of all the tasks, and they are arranged on a queue known as the ‘run-queue’ (struct runqueue in kernel/sched.c) Those tasks that are blocked while awaiting a specific event to occur are put on alternative sub-lists, called ‘wait queues’, associated with the particular event(s) that will allow a blocked task to be unblocked (wait_queue_t in include/linux/wait.h and kernel/wait.c) Kernel Wait Queues waitqueue wait_queue_head_t can have 0 or more wait_queue_t chained onto them waitqueue However, usually just one element wait_queue_t wait_queue_head_t waitqueue Each wait_queue_t contains a list_head of tasks waitqueue All processes waiting for specific "event“ Used for timing, synch, device i/o, etc. Kernel stack Each process in Linux has two stacks, a user stack and a kernel stack (8KB by default) Kernel stack can only be accessed in kernel mode Interrupt and trap handlers run on kernel stack • User stack cannot be trusted Q1: switching address spaces is costly. Can we avoid this overhead when entering kernel mode from user mode? Q2: how does the hardware find the current task’s kernel stack ? Q1: A task’s virtual-memory layout 4G kernel mode Kernel space 3G User-mode stack-area User space user mode Shared runtime-libraries Task’s code and data 0 process descriptor and kernel-mode stack Kernel space is also mapped into user space from user mode to kernel mode, no need to switch address spaces Protection? Kernel space is only accessible when mode bit = 0 Q2: Finding current task’s kernel stack (on x86) Global Descriptor Table initialized in startup_32 in arch/i386/boot/compress ed/head.S tr CPU0 esp CPU0 Hardware retrieves kernel stack top and load it into %esp, also saves previous %esp, for return to user mode Still need to find task_struct ! kern stack top Changes on each context switch (__switch_to in arch/i386/kernel/process.c) Task’s kernel-stack 8-KB Connections between task_struct and kernel stack Linux uses part of a task’s kernel-stack to store a structure thread_info thread_info contains low-level data that low-level code (e.g. entry.S) can immediate access, and a pointer to the task’s task_struct esp 0xe8010000 Task’s kernel-stack struct task_struct Task’s process-descriptor 8-KB Task’s thread-info 8KB aligned 0xe800e000 How to find thread_info? movl andl 13 bits) $0xFFFFE000, %eax %esp, %eax (mask out last esp 0xe8010000 Task’s kernel-stack struct task_struct Task’s process-descriptor 8-KB Task’s thread-info 8KB aligned 0xe800e000 How to find thread_info? (cont) Macro current_thread_info implements this computation current macro yields the task_struct of current task include/asm-i386/current.h Why good? Fast ! 2 instructions to find current from %esp current is not a static variable, useful for SMP http://lxr.linux.no/linux+v2.6.11/ Today Processes (cont.) Process termination Interprocess communication Processes in Linux Process data structures Process operations: fork() and exit() Context switch implementation fork() call chain libc fork() system_call (arch/i386/kernel/entry.S) sys_clone() (arch/i386/kernel/process.c) do_fork() (kernel/fork.c) copy_process() (kernel/fork.c) p = dup_task_struct(current) // shallow copy copy_* // copy point-to structures copy_thread () // copy stack, regs, and eip, // and set child return value // to 0 via // childregs->eax = 0; wake_up_new_task() // set child runnable exit() call chain libc exit(code) system_call (arch/i386/kernel/entry.S) sys_exit() (kernel/exit.c) do_exit() (kernel/exit.c) exit_*() // free data structures exit_notify() // tell other processes we exit // reparent children to init // EXIT_ZOMBIE // EXIT_DEAD Today Processes (cont.) Process termination Interprocess communication Processes in Linux Process data structures Process operations: fork() and exit() Context switch implementation Context switch call chain schedule() (kernel/sched.c) (talk about scheduling later) context_switch() swtich_mm (include/asm-i386/mmu_context.h) switch address space switch_to (include/asm-i386/system.h) switch stack, regs, and %eip __swtich_to (arch/i386/kernel/process.c) Context switch by stack swtich: the idea Kernel stack captures process states Registers Task_struct through thread_info Changing the stack pointer changes the process Task’s kernel-stack Task’s process-descriptor Task’s thread-info Context switch by stack switch: the implementation (simplified) P0 stack eax P1 stack eax … … ret_addr thread_info thread_info esp esp eip p0->eip = ret_addr esp eip eax … CPU eip swtich_to(p0,p1) save registers on stack p0->esp = %esp p0->eip = ret_addr; %esp = p1->esp; push p1->eip; ret ret_addr: pop registers from stack