Linux Review COMS W4118 Spring 2008

advertisement
Linux
Review
COMS W4118
Spring 2008
Linux Overview




History
Distributions
Licensing
Components



Kernel, daemons, libraries, utilities, etc
Modules
Build Process
2
Core Kernel
Applications
System Libraries (libc)
Modules
System Call Interface
I/O Related
File Systems
Networking
Process Related
Scheduler
Memory Management
Device Drivers
IPC
Architecture-Dependent Code
Hardware
3
System Calls








System calls vs. libraries
How to implement (“int 80x”)
Syscall interface
Trapping into the kernel
Dispatch table / jump table
Passing parameters
Accessing user space
Returning values
4
Invoking System Calls
user-mode
(restricted privileges)
…
xyz()
…
app
making
system
call
call
wrapper
routine
in std C
library
kernel-mode
(unrestricted privileges)
sys_xyz() { … }
ret
xyz {
…
int 0x80;
…
}
call
int 0x80
iret
system
call
service
routine
ret
system_call:
…
sys_xyz();
…
system
call
handler
5
Process Address Space
4 GB
Privilege-level 0
Kernel space
kernel-mode stack
3 GB
User-mode stack-area
User space
Privilege-level 3
Shared runtime-libraries
Task’s code and data
0 GB
6
Linux Processes/Tasks










Processes/tasks
The process descriptor: task_struct
Thread context
Task States
Process relationships
Wait queues
Kernel threads
Context switching
Creating processes
Destroying processes
7
Linux: Processes or Threads?

Linux uses a neutral term: tasks


Linux view



Tasks represent both processes and threads
Threads: processes that share address space
Linux "threads" (tasks) are really "kernel threads“
Lighter-weight than traditional processes


File descriptors, VM mappings need not be copied
Implication: file table and VM table not part of
process descriptor
8
The Linux process descriptor
pagedir[]
task_struct
Each process
descriptor
contains many
fields
and some are
pointers to
other kernel
structures
state
*stack
mm_struct
flags
*pgd
*mm
user_struct
exit_code
*user
pid
which may
themselves
include fields
that point to
structures
files_struct
*files
*parent
*signal
signal_struct
9
The Task Structure


The task_struct is used to represent a task.
The task_struct has several sub-structures that
it references:
tty_struct – TTY associated with the process
fs_struct – current and root directories associated with
the process
files_struct – file descriptors for the process
mm_struct – memory areas for the process
signal_struct – signal structures associated with the
process
user_struct – per-user information (for example,
number of current processes)
Task States
From kernel-header: <linux/sched.h>









#define TASK_RUNNING
#define TASK_INTERRUPTIBLE
#define TASK_UNINTERRUPTIBLE
#define TASK_STOPPED
#define TASK_TRACED
#define EXIT_ZOMBIE
#define EXIT_DEAD
#define TASK_NONINTERACTIVE64
#define TASK_DEAD
0
1
2
4
8
16
32
128
11
Task List, Run Queue
init_task list
run_queue
Those tasks that are ready-to-run comprise a sub-list of all the tasks,
and they are arranged on a queue known as the ‘run-queue’
Those tasks that are blocked while awaiting a specific event to occur
are put on alternative sub-lists, called ‘wait queues’, associated with
the particular event(s) that will allow a blocked task to be unblocked
12
Kernel Wait Queues
waitqueue
wait_queue_head_t
can have 0 or more
wait_queue_t chained
onto them
waitqueue
However, usually just
one element
wait_queue_t
wait_queue_head_t
waitqueue
Each wait_queue_t
contains a list_head
of tasks
waitqueue
All processes waiting
for specific "event“
Used for timing,
synch, device i/o,13etc.
How Do I Block?

By calling one of the sleep_on functions:



These functions create a wait_queue and place the
calling task on it
Modify the value of its ‘state’ variable:





sleep_on, interruptible_sleep_on, sleep_on_timeout, etc.
TASK_UNINTERRUPTIBLE
TASK_INTERRUPTIBLE
Then call schedule or schedule_timeout
The next task to run calls deactivate_task to move
us out of the run queue
Only tasks with ‘state == TASK_RUNNING’ are
granted time on the CPU by the scheduler
14
How Do I Wake Up?

By someone calling one of the wake functions:


These functions call the curr->func function to wake
up the task





wake_up, wake_up_all, wake_up_interruptible, etc.
Defaults to default_wake_function which is
try_to_wake_up
try_to_wake_up calls activate_task to move us out
of the run queue
The ‘state’ variable is set to TASK_RUNNING
Sooner or later the scheduler will run us again
We then return from schedule or schedule_timeout
15
Wait Queue Options

INTERUPTIBLE vs. NON-INTERUPTIBLE:


TIMEOUT vs no timeout:


Can the task be woken up by a signal?
Wake up the task after some timeout interval
EXCLUSIVE vs. NON-EXCLUSIVE:




Should only one task be woken up?
Only one EXCLUSIVE task is woken up
 Kept at end of the list
All NON-EXCLUSIVE tasks are woken up
 Kept at head of the list
Functions with _nr option wake up number of tasks
16
Context Switching



Context switching is the process of saving
the state of the currently running task and
loading the state of the next task to run.
This involves saving the task's CPU state
(registers), changing the current task value,
and loading the CPU state of the new task
into the registers.
schedule determines the next task to run,
calls context_switch, which calls switch_mm
to change the process address space, then
calls switch_to to context switch to the new
task.
The Role of the Stack






One process must save state where another can
find it
When the new state is loaded, the CPU is running
another process -- the state is the process!
The stack pointer determines most of the state
Some of the registers are on the stack
The stack pointer determines the location of
thread_info, which also points to task struct
Changing the stack pointer changes the process!
18
Context Switch: FP Registers




On context switch:
 Hardware flag set: TS in cr0
 Software flag TS_USEDFPU is cleared in task_struct
If task uses floating point instruction and hardware flag is set:
 Hardware raises “device not available” exception (trap)
 Kernel restores floating point registers
 TS is cleared
 TS_USEDFPU is set in the task_struct for this process
Any time it’s set, floating point registers are saved for that
process at switch time (but not restored for the next)
Bottom line: only done if needed; if only one process uses
floating point, no save/restore needed
19
Process Creation and Deletion


fork and clone system calls
do_fork is kernel implementation:
CLONE_VM - share address space
CLONE_FS - share root and current working directories
CLONE_FILES - share file descriptors
CLONE_SIGHAND - share signal handlers
CLONE_PARENT – share parent process ID
CLONE_THREAD – create thread for process


Exit and wait system calls, zombie processes
Do_exit, release_task
20
Kernel Threads

Linux has a small number of kernel threads that run continuously
in the kernel (daemons)






No user address space
Only execute code and access data in kernel address space
How to create: kernel_thread
Scheduled in the same way as other threads/tasks
Process 0: idle process
Process 1: init process




Spawns several kernel threads before transitioning to user mode
as /sbin/init
kflushd (bdflush) – Flush dirty buffers to disk under "memory
pressure"
kupdate – Periodically flushes old buffers to disk
kswapd – Swapping daemon
21
Scheduling Philosophies






Priority is the primary scheduling mechanism
Priority is dynamically adjusted at run time
 Processes denied access to CPU get increased
 Processes running a long time get decreased
Try to distinguish interactive processes from noninteractive
 Bonus or penalty reflecting whether I/O or compute bound
Use large quanta for important processes
 Modify quanta based on CPU use
 Quantum != clock tick
Associate processes to CPUs
Do everything in O(1) time
22
Runqueue for O(1) Scheduler
priority array
priority queue
active
.
.
.
.
.
.
priority queue
expired
Higher priority
more I/O
800ms quanta
lower priority
more CPU
10ms quanta
priority array
priority queue
.
.
.
.
.
.
priority queue
23
Basic Scheduling Algorithm






Find the highest-priority queue with a
runnable process
Find the first process on that queue
Calculate its quantum size
Let it run
When its time is up, put it on the expired list
Repeat
24
Scheduling Components





Static Priority
Sleep Average
Bonus
Interactivity Status
Dynamic Priority
25
Time Slice based on Priority
Priority:
Static Pri
Niceness
Quantum
Highest
100
-20
800 ms
High
110
-10
600 ms
Normal
120
0
100 ms
Low
130
10
50 ms
Lowest
139
20
5 ms
26
Priority Array Swapping




The system only runs processes from active
queues, and puts them on expired queues when
they use up their quanta
When a priority level of the active queue is empty,
the scheduler looks for the next-highest priority
queue
After running all of the active queues, the active and
expired queues are swapped
There are pointers to the current arrays; at the end
of a cycle, the pointers are switched
27
Real-Time Scheduling





Linux has soft real-time scheduling
 No hard real-time guarantees
All real-time processes are higher priority than any conventional
processes
 Processes with priorities [0, 99] are real-time
First-in, first-out: SCHED_FIFO
 Static priority
 Process is only preempted for a higher-priority process
 No time quanta; it runs until it blocks or yields voluntarily
 RR within same priority level
Round-robin: SCHED_RR
 As above but with a time quanta (800 ms)
Normal processes have SCHED_OTHER scheduling policy
28
Multiprocessor Scheduling




Each processor has a separate run queue
Each processor only selects processes from its own
queue to run
Yes, it’s possible for one processor to be idle while
others have jobs waiting in their run queues
Periodically, the queues are rebalanced: if one
processor’s run queue is too long, some processes
are moved from it to another processor’s queue
29
Processor Affinity





Each process has a bitmask saying what
CPUs it can run on
Normally, of course, all CPUs are listed
Processes can change the mask
The mask is inherited by child processes
(and threads), thus tending to keep them on
the same CPU
Rebalancing does not override affinity
30
Download