W4118 Operating Systems Instructor: Junfeng Yang

advertisement
W4118 Operating Systems
Instructor: Junfeng Yang
Logistics

Homework 2 out: system call fault injector






We’ll use VM for kernel programming assignments
Three ways to get the class VM




System call can fail for a variety of reasons
Many programs must correctly these failures
Our fault injector can test this
How? You add a system call fail(), to fail one of the
future system calls a process issues
Download from the course website
Go to office hours and copy from me or TAs
We’ll make a few DVDs
Who’re looking for teammates?
Last lecture

Process: a good way to manage concurrent
activities


Address space
Mechanism: Process dispatching
• Policy: process scheduling
• Dispatcher gains control via periodic timer interrupt
• Dispatcher saves process state to PCB on context
switch
• Dispatcher maintains scheduling queues of processes

Common process operations
• Process creation
Today

Processes (cont.)



Process termination
Interprocess communication
Processes in Linux



Where is the relevant code
task_struct
Context Switch in Linux. switch stack.
Process Termination

Process executes last statement and asks the
operating system to delete it (exit(int status)). In
exit():
 OS notifies parent process child exit status
• Parent gets this status via wait(int* stat_loc)
• 0: success, non-zero: failure


OS deallocates child’s resources
Processes can be terminated by other processes

E.g. Parent may terminate execution of children processes
• Child has exceeded allocated resources
• Task assigned to child is no longer required
• If parent is exiting
– Some operating system do not allow child to continue if its parent
terminates: All children terminated - cascading termination
Notes on UNIX Process Termination

What if child exits before parent?



Child process becomes a zombie process
Parent must call wait() to “reap” child. OS will notify parent about
child’s termination
What if parent exits before child?


Orphaned processes
Re-parented to process 1, the init process
while (1) {
write (1, "$ “, 2);
parse_cmd (command, args); // parse user input
switch(pid = fork ()) {
case -1: perror (“fork”); break;
case 0: // child
execv (command, args, 0); break;
default: // parent
wait (0); break; // wait for child to terminate
}
}
Today

Processes (cont.)



Process termination
Interprocess communication
Processes in Linux



Relevant files
Data structures
Context switch implementation
Cooperating Processes



Independent process cannot affect or be
affected by the execution of another process.
Cooperating process can affect or be affected
by the execution of another process
Advantages of process cooperation



Information sharing
Computation speed-up
Modularity/Convenience
Interprocess Communication Models
Message Passing
Shared Memory
Message Passing v.s. Shared Memory

Message passing



Why good? Simpler. All sharing is explicit
Why bad? Overhead. Data copying, cross
protection domains
Shared Memory


Why good? Performance. Set up shared memory
once, then access w/o crossing protection domains
Why bad? Synchronization
IPC Example: Unix signals

Signals


A very short message: just a small integer
A fixed set of available signals. Examples:
• 2: SIGINT, sent (usuallly) when you press ctrl+C
• 9: SIGKILL, to kill a process
• 11: SIGSEGV, sent when there is a memory error

Send a signal to a process



kill(pid_t pid, int sig)
Signal can be sent by users, kernel, or other processes
What to do when receiving a signal? Installing a
handler for a signal

sighandler_t signal(int signum, sighandler_t handler);
IPC Example: Unix pipe

int pipe(int fd[2]);





Returns two file descriptors in fd[0] and fd[1];
Writes to fd[1] will be read on fd[0]
When last copy of fd[1] closed, fd[0] will return
EOF
Return 0 on success, -1 on error
Operations on pipes:



read/write/close --- as with files
When fd[1] closed, read(fd[0]) returns 0 bytes
When fd[0] closed, write(fd[1]):
• Kills process with SIGPIPE, or if blocked
• Failes with EPIPE
IPC Example: Unix pipe (cont.)
int pipefd[2];
pipe(pipefd);
switch(pid=fork()) {
case -1: perror("fork"); exit(1);
case 0: close(pipefd[0]);
// write to fd[1]
break;
default: close(pipefd[1]);
// read from fd[0]
break;
}
IPC Example: Unix Shared Memory

int shmget(key_t key, size_t size, int shmflg);



void* shmat(int shmid, const void *addr, int flg)



Create a shared memory segment, and return its id
key: unique identifier of a shared memory segment, or
IPC_PRIVATE (means create a new shared mem seg)
Attach shared memory segment to address space of the
calling process. Return a pointer to shared memory
shmid: id returned by shmget()
int shmdt(const void *shmaddr);

Detach from shared memory
IPC Example: Unix Shared Memory (cont.)
int id = shmget(IPC_PRIVATE, sizeof(int),
IPC_CREAT | 0666);
int *x = (int*)shmat(id, NULL, 0);
*x = 0;
switch(pid=fork()) {
case -1: perror("fork"); exit(1);
case 0: while(1) { ++*x; sleep(1); }
default: while(1) { printf(“x = %d\n”, *x); sleep(1); }
}
Problem: synchronization! (later)
Today

Processes (cont.)



Process termination
Interprocess communication
Processes in Linux



Process data structures
Process operations: fork() and exit()
Context switch implementation
Find process info: /proc/<pid>


ps to get process id
For each process, there is a corresponding
directory /proc/<pid> to store this process
information in the /proc pseudo file system
Process-related files


Header files
 include/linux/sched.h – declarations for most task
data structures
 include/linux/wait.h – declarations for wait queues
 include/asm-i386/system.h – architecture-dependent
declarations
Source files








kernel/sched.c – task scheduling routines
kernel/signal.c – signal handling routines
kernel/fork.c – process/thread creation routines
kernel/exit.c – process exit routines
fs/exec.c – executing program
arch/i386/kernel/entry.S – kernel entry points
arch/i386/kernel/process.c – architecture-dependent
process routines
http://lxr.linux.no/
Linux: Processes or Threads?

Linux uses a neutral term: tasks




Tasks represent both processes and threads
Threads = tasks that share AS data structures
When processes trap into the kernel, they share
the Linux kernel’s address space  kernel threads
Task data structure


task_struct: process control block
kernel stack: work space for systems calls (the
kernel executes on the user process’s behalf) or
interrupt handlers
Process Control Block in Linux

task_struct (process descriptor in ULK)



include/linux/sched.h
Each task has a unique task_struct
http://lxr.linux.no/linux+v2.6.11/
Task States: state






TASK_RUNNING – the thread is running on the CPU or is
waiting to run
TASK_INTERRUPTIBLE – the thread is sleeping and can be
awoken by a signal (EINTR)
TASK_UNINTERRUPTIBLE – the thread is sleeping and cannot
be awakened by a signal
TASK_STOPPED – the process has been stopped by a signal or
by a debugger
TASK_TRACED – the process is being traced via the ptrace
system call
include/linux/sched.h
Exit States


EXIT_ZOMBIE – the process is exiting but has not yet been
waited for by its parent
EXIT_DEAD – the process has exited and has been waited
for
Process IDs


process ID: pid
thread group ID: tgid





pid of first thread in process
getpid() returns this ID, so all threads in a process
share the same process ID
many system calls identify a process by its
PID
Linux kernel uses pidhash to efficiently find
processes by pids
(see include/linux/pid.h, kernel/pid.c)
Other PCB data structures





user: user_struct – per-user information (for
example, number of current processes)
mm, active_mm: mm_struct – memory areas
for the process (address space)
fs: fs_struct – current and root directories
associated with the process
files: files_struct – file descriptors for the
process
signal: signal_struct – signal structures
associated with the process
Process Relationships

Processes are related: children, sibling


Parent/child (fork()), siblings
Possible to "re-parent"
• Parent vs. original parent


Process groups: signal_struct->pgrp


Parent can "wait" for child to terminate
Possible to send signals to all members
Sessions: signal_struct->session

Processes related to login
How Linux manages processes


In order for Linux to efficiently manage the
scheduling of its various ‘tasks’, separate
queues are maintained for ‘running’ tasks and
for tasks that temporarily are ‘blocked’ while
waiting for a particular event to occur (such as
the arrival of new data from the keyboard, or
the exhaustion of prior data sent to the
printer)
These queues are implemented using doublylinked list (struct list_head in include/linux/list.h)
Some tasks are ‘ready-to-run’
init_task list
run_queue
Those tasks that are ready-to-run comprise a sub-list of all the tasks,
and they are arranged on a queue known as the ‘run-queue’
(struct runqueue in kernel/sched.c)
Those tasks that are blocked while awaiting a specific event to occur
are put on alternative sub-lists, called ‘wait queues’, associated with
the particular event(s) that will allow a blocked task to be unblocked
(wait_queue_t in include/linux/wait.h and kernel/wait.c)
Kernel Wait Queues
waitqueue
wait_queue_head_t
can have 0 or more
wait_queue_t chained
onto them
waitqueue
However, usually just
one element
wait_queue_t
wait_queue_head_t
waitqueue
Each wait_queue_t
contains a list_head
of tasks
waitqueue
All processes waiting
for specific "event“
Used for timing,
synch, device i/o, etc.
Kernel stack

Each process in Linux has two stacks, a user
stack and a kernel stack (8KB by default)


Kernel stack can only be accessed in kernel mode
Interrupt and trap handlers run on kernel stack
• User stack cannot be trusted


Q1: switching address spaces is costly. Can we
avoid this overhead when entering kernel mode
from user mode?
Q2: how does the hardware find the current
task’s kernel stack ?
Q1: A task’s virtual-memory layout
4G
kernel mode
Kernel space
3G
User-mode stack-area
User space
user mode
Shared runtime-libraries
Task’s code and data
0
process descriptor
and
kernel-mode stack
Kernel space is also
mapped into user
space  from user
mode to kernel
mode, no need to
switch address
spaces
Protection?
Kernel space is only
accessible when
mode bit = 0
Q2: Finding current task’s kernel stack
(on x86)
Global Descriptor Table
initialized in startup_32 in
arch/i386/boot/compress
ed/head.S
tr
CPU0
esp
CPU0
Hardware retrieves kernel
stack top and load it into %esp,
also saves previous %esp, for
return to user mode
Still need to find task_struct !
kern stack top
Changes on each context
switch (__switch_to in
arch/i386/kernel/process.c)
Task’s kernel-stack
8-KB
Connections between task_struct and
kernel stack


Linux uses part of a task’s kernel-stack to store a
structure thread_info
thread_info contains low-level data that low-level code
(e.g. entry.S) can immediate access, and a pointer to
the task’s task_struct
esp
0xe8010000
Task’s kernel-stack
struct task_struct
Task’s
process-descriptor
8-KB
Task’s thread-info
8KB aligned
0xe800e000
How to find thread_info?
movl
andl
13 bits)
$0xFFFFE000, %eax
%esp, %eax (mask out last
esp
0xe8010000
Task’s kernel-stack
struct task_struct
Task’s
process-descriptor
8-KB
Task’s thread-info
8KB aligned
0xe800e000
How to find thread_info? (cont)

Macro current_thread_info implements this
computation
current macro yields the task_struct of current task
include/asm-i386/current.h

Why good?





Fast ! 2 instructions to find current from %esp
current is not a static variable, useful for SMP
http://lxr.linux.no/linux+v2.6.11/
Today

Processes (cont.)



Process termination
Interprocess communication
Processes in Linux



Process data structures
Process operations: fork() and exit()
Context switch implementation
fork() call chain









libc fork()
system_call (arch/i386/kernel/entry.S)
sys_clone() (arch/i386/kernel/process.c)
do_fork() (kernel/fork.c)
copy_process() (kernel/fork.c)
p = dup_task_struct(current) // shallow copy
copy_* // copy point-to structures
copy_thread () // copy stack, regs, and eip,
// and set child return value
// to 0 via
// childregs->eax = 0;
wake_up_new_task() // set child runnable
exit() call chain






libc exit(code)
system_call (arch/i386/kernel/entry.S)
sys_exit() (kernel/exit.c)
do_exit() (kernel/exit.c)
exit_*() // free data structures
exit_notify() // tell other processes we exit
// reparent children to init
// EXIT_ZOMBIE
// EXIT_DEAD
Today

Processes (cont.)



Process termination
Interprocess communication
Processes in Linux



Process data structures
Process operations: fork() and exit()
Context switch implementation
Context switch call chain





schedule() (kernel/sched.c) (talk about scheduling later)
context_switch()
swtich_mm (include/asm-i386/mmu_context.h)
switch address space
switch_to (include/asm-i386/system.h)
switch stack, regs, and %eip
__swtich_to (arch/i386/kernel/process.c)
Context switch by stack swtich: the idea

Kernel stack captures process states



Registers
Task_struct through thread_info
Changing the stack pointer changes the
process
Task’s kernel-stack
Task’s
process-descriptor
Task’s thread-info
Context switch by stack switch: the
implementation (simplified)
P0 stack
eax
P1 stack
eax
…
…
ret_addr
thread_info
thread_info
esp
esp
eip
p0->eip = ret_addr
esp
eip
eax
…
CPU
eip
swtich_to(p0,p1)
save registers on stack
p0->esp = %esp
p0->eip = ret_addr;
%esp = p1->esp;
push p1->eip;
ret
ret_addr:
pop registers from stack
Download