Linux Operating System 許 富 皓 1 Chapter 3 Processes 2 Non-circular Doubly Linked Lists A sequence of nodes chained together through two kinds of pointers: a pointer to its previous node and a pointer to its subsequent node. Each node has two links: one points to the previous node, or points to a null value or empty list if it is the first node and one points to the next, or points to a null value or empty list if it is the final node. 3 Problems with Doubly Linked Lists The Linux kernel contains hundred of various data structures that are linked together through their respective doubly linked lists. Drawbacks: a waste of programmers' efforts to implement a set of primitive operations, such as, initializing the list inserting and deleting an element scanning the list. a waste of memory to replicate the primitive operations for each different list. 4 Data Structure struct list_head Therefore, the Linux kernel defines the struct list_head data structure, whose only fields next and prev represent the forward and back pointers of a generic doubly linked list element, respectively. It is important to note, however, that the pointers in a list_head field store addresses of other list_head fields rather than the addresses of the whole data structures in which the list_head structure is included. the 5 A Circular Doubly Linked List with Three Elements data structure 1 data structure 2 data structure 3 list_head list_head list_head list_head next next next next prev prev prev prev 6 Macro LIST_HEAD(list_name) A new list is created by using the LIST_HEAD(list_name) macro. declares a new variable named list_name of type list_head, which is a dummy first element that acts as a placeholder for the head of the new list. and it initializes the prev and next fields of the list_head data structure so as to point to the list_name variable itself. it 7 Code of Macro LIST_HEAD(list_name) struct list_head { struct list_head *next, *prev; }; #define LIST_HEAD_INIT(name) { &(name), &(name) } #define LIST_HEAD(name) \ struct list_head name = LIST_HEAD_INIT(name) 8 An Empty Doubly Linked List LIST_HEAD(my_list) next prev struct list_head my_list 9 Relative Functions and Macros (1) list_add(n,p) list_add_tail(n,p) list_del(p) list_empty(p) n p 1 2 ... n 10 Relative Functions and Macros (2) list_for_each(p,h) list_for_each_entry(p,h,m) list_entry(p,t,m) Returns the address of the data structure of type t in which the list_head field that has the name m and the address p is included. 11 Example of list_entry(p,t,m) sturct class{ char name[20]; char teacher[20]; struct student_pointer struct list_head link; }; name *student; struct class grad_1A; struct list_head *poi; poi=&(grad_1A.link); list_entry(poi,struct class,link) &grad_1A (20 bytes) teacher (20 bytes) student (4 bytes) link next (4 bytes) prev(4 bytes) 12 Code of list_entry typedef unsigned int __kernel_size_t; typedef __kernel_size_t size_t; #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) #define list_entry (ptr, type, member) \ ({ \ const typeof(((type *)0)->member)*__mptr= (ptr); \ (type *)((char *)__mptr - offsetof(type,member) );\ }) 13 Explanation of list_entry(p,t,m) offset #define list_entry(ptr, type, member) \ ((type *)((char *)(ptr)-(unsigned long)(&((type *)0)->member))) list_entry(…) name (20 bytes) poi - offset teacher offset (20 bytes) = list_entry() student (4 bytes) poi link next (4 bytes) prev(4 bytes) list_entry(poi,struct class,link) ((struct class *)((char *)(poi)-(unsigned long)(&((struct class *)0)->link))) 14 hlist_head The Linux kernel 2.6 supports another kind of doubly linked list, which mainly differs from a list_head list because it is NOT circular. It is mainly used for hash tables. The list head is stored in an hlist_head data structure, which is simply a pointer to the first element in the list (NULL if the list is empty). 15 hlist_node Each element is represented by an hlist_node data structure, which includes pointer next to the next element and a pointer pprev to the next field of the previous element. a Because the list is not circular, the pprev field of the first element and the next field of the last element are set to NULL. 16 A Non-circular Doubly Linked List struct hlist_head struct hlist_node struct hlist_node *first pprev next pprev next pprev next 17 Functions and Macro for hlist_head and hlist_node The list can be handled by means of several helper functions and macros similar to those listed in previous sixth slide: hlist_add_head( ), hlist_del( ), hlist_empty( ), hlist_entry, hlist_for_each_entry, and so on. 18 The Process List The process list is a circular doubly linked list that links the process descriptors of all existing thread group leaders: task_struct structure includes a tasks field of type list_head whose prev and next fields point, respectively, to the previous and to the next task_struct element’s tasks field. Each 19 The Head of the Process List The head of the process list is the init_task task_struct descriptor; it is the process descriptor of the so-called process 0 or swapper (see the section "Kernel Threads" later in this chapter). The tasks->prev field of init_task points to the tasks field of the process descriptor inserted last in the list. 20 Code for init_task struct task_struct init_task = INIT_TASK(init_task); #define INIT_TASK(tsk) { .state = 0, .thread_info = &init_thread_info, .usage = ATOMIC_INIT(2), .flags = 0, .lock_depth = -1, .prio = MAX_PRIO-20, .static_prio = MAX_PRIO-20, .policy = SCHED_NORMAL, .cpus_allowed = CPU_MASK_ALL, .mm = NULL, .active_mm = &init_mm, .run_list = LIST_HEAD_INIT(tsk.run_list), : : } \ \ \ \ \ \ \ \ \ \ \ \ \ \ 21 Insert and Delete a Process Descriptor from the Process List The SET_LINKS and REMOVE_LINKS macros are used to insert and to remove a process descriptor in the process list, respectively. These macros also take care of the parenthood relationship of the process (see the section "How Processes Are Organized" later in this chapter). 22 Scans the Whole Process List with Macro for_each_process #define for_each_process(p) \ for (p=&init_task; (p=list_entry((p)->tasks.next, \ struct task_struct, tasks)) != &init_task; ) The macro starts by moving PAST init_task to the next task and continues until it reaches init_task again (thanks to the circularity of the list). At each iteration, the variable p passed as the argument of the macro contains the address of the currently scanned process descriptor, as returned by the list_entry macro. 23 Example Macro for_each_process scans the whole thread group leader list. The macro is the loop control statement after which the kernel programmer supplies the loop. e.g. counter=1; /* for init_task */ for_each_process(t) { if(t->state==TASK_RUNNING) ++counter; } 24 The Lists of TASK_RUNNING Processes – in Early Linux Version When looking for a new process to run on a CPU, the kernel has to consider only the runnable processes (that is, the processes in the TASK_RUNNING state). Earlier Linux versions put all runnable processes in the same list called runqueue. Because it would be too costly to maintain the list ordered according to process priorities, the earlier schedulers were compelled to scan the whole list in order to select the "best" runnable process. Linux 2.6 implements the runqueue differently. 25 The Lists of TASK_RUNNING Processes – in Linux Version 2.6 Linux 2.6 achieves the scheduler speedup by splitting the runqueue in many lists of runnable processes, one list per process priority. Each task_struct descriptor includes a run_list field of type list_head. If the process priority is equal to k (a value ranging between 0 and 139), the run_list field links the process descriptor into the list of runnable processes having priority k. 26 runqueue in a Multiprocessor System Furthermore, on a multiprocessor system, each CPU has its own runqueue, that is, its own set of lists of processes. 27 Trade-off of runqueue runqueue is a classic example of making a data structures more complex to improve performance: to make scheduler operations more efficient, the runqueue list has been split into 140 different lists! 28 The Main Data Structures of a runqueue The kernel must preserve a lot of data for every runqueue in the system. The main data structures of a runqueue are the lists of process descriptors belonging to the runqueue. All these lists are implemented by a single prio_array_t (= struct prio_array ) data structure. 29 struct prio_array struct prio_array { unsigned int unsigned long struct list_head }; nr_active; bitmap[BITMAP_SIZE]; queue[MAX_PRIO]; nr_active: the number of process descriptors linked into the lists. bitmap: a priority bitmap: each flag is set if and only if the corresponding priority list is not empty queue: the 140 heads of the priority lists. 30 The prio and array Field of a Process Descriptor The prio field of the process descriptor stores the dynamic priority of the process. The array field is a pointer to the prio_array_t data structure of its current runqueue. P.S.: Each CPU has its own runqueue. 31 Scheduler-related Fields of a Process Descriptor prio_array_t unsigned int nr_active unsigned long bitmap[5] struct [0] list_head [1] queue[140][x] struct task_struct struct task_struct struct task_struct : : : int prio struct list_head run_list prev next int prio struct list_head run_list prev next prio_array_t *array prio_array_t *array : : int . . . prio struct list_head run_list prev next prio_array_t *array : 32 Function enqueue_task(p,array) The enqueue_task(p,array) function inserts a process descriptor into a runqueue list; its code is essentially equivalent to: list_add_tail(&p->run_list, &array->queue[p->prio]); __set_bit(p->prio, array->bitmap); array->nr_active++; p->array = array; 33 Function dequeue_task(p,array) Similarly, the dequeue_task(p,array) function removes a process descriptor from a runqueue list. 34 Relationships among Processes Processes created by a program have a parent/child relationship. When a process creates multiple children, these children have sibling relationships. Several fields must be introduced in a process descriptor to represent these relationships with respect to a given process P. Processes 0 and 1 are created by the kernel. Process 1 (init) is the ancestor of all other processes. 35 Fields of a Process Descriptor Used to Express Parenthood Relationships (1) real_parent: points to the process descriptor of the process that created P or points to the descriptor of process 1 (init) if the parent process no longer exists. Therefore, when a user starts a background process and exits the shell, the background process becomes the child of init. 36 Fields of a Process Descriptor Used to Express Parenthood Relationships (2) parent: Points to the current parent of P this is the process that must be signaled when the child process terminates. its value usually coincides with that of real_parent. It may occasionally differ, such as when another process issues a ptrace( ) system call requesting that it be allowed to monitor P. see the section "Execution Tracing" in Chapter 20. 37 Fields of a Process Descriptor Used to Express Parenthood Relationships (3) struct list_head children: struct list_head sibling: The head of the list containing all children created by P. This list is formed through the sibling field of the child processes. The pointers to the next and previous elements in the list of the sibling processes, those that have the same parent as P. P.S.: /* children/sibling forms the list of my children plus the tasks I'm ptracing. */ struct list_head children; /* list of my children */ struct list_head sibling; /* linkage in my parent's children list */ 38 The children Field of a Patent Process Points to the sibling Field of a Child Process #define add_parent(p, parent) list_add_tail(&(p)->sibling,&(parent)->children) ========================================================= #define SET_LINKS(p) do { if (thread_group_leader(p)) list_add_tail(&(p)->tasks,&init_task.tasks); add_parent(p, (p)->parent); } while (0) 39 Iterate over a Process’s Children Similarly, it is possible to iterate over a process's children with #define list_for_each(pos, head) \ for(pos = (head)->next; pos != (head); pos = pos->next) struct task_struct *task; struct list_head *list; list_for_each(list, &current->children) { task = list_entry(list, struct task_struct, sibling); /* task now points to one of current's children */ } 40 Example Process P0 successively created P1, P2, and P3. Process P3, in turn, created process P4. children/sibling fields forms the list of children of P0 (those links marked with ) 41 Process Groups Modern Unix operating systems introduce the notion of process groups to represent a job abstraction. For example, in order to execute the command line: $ ls | sort | more a shell that supports process groups, such as bash, creates a new group for the three processes corresponding to ls, sort, and more. In this way, the shell acts on the three processes as if they were a single entity (the job, to be precise). 42 Process Groups [waikato] One important feature is that it is possible to send a signal to every process in the group. Process groups are used for distribution of signals, and by terminals to arbitrate requests for their input and output. Process Groups [waikato] Foreground Process Groups A foreground process has read and write access to the terminal. Every process in the foreground receives SIGINT (^C ) SIGQUIT (^\ ) and SIGTSTP signals. Background Process Groups A background process does not have read access to the terminal. If a background process attempts to read from its controlling terminal its process group will be sent a SIGTTIN. Group Leaders and Process Group IDs Each process descriptor includes a field containing the process group ID. Each group of processes may have a group leader, which is the process whose PID coincides with the process group ID. 45 Creation of a New Process Group [Bhaskar] A newly created process is initially inserted into the process group of its parent. The shell after doing a fork would explicitly call setpgid to set the process group of the child. The process group is explicitly set for purposes of job control. When a command is given at the shell prompt, that process or processes (if there is piping) is assigned a new process group. 46 Login Sessions Modern Unix kernels also introduce login sessions. Informally, a login session contains all processes that are descendants of the process that has started a working session on a specific terminal -- usually, the first command shell process created for the user. 47 Login Sessions vs. Process Groups All processes in a process group must be in the same login session. A login session may have several process groups active simultaneously; one of these process groups is always in the foreground, which means that it has access to the terminal. The other active process groups are in the background. When a background process tries to access the terminal, it receives a SIGTTIN or SIGTTOUT signal. In many command shells, the internal commands bg and fg can be used to put a process group in either the background or the foreground. 48 Other Relationship between Processes There exist other relationships among processes: a process can be a leader of a process group or of a login session, it can be a leader of a thread group, and it can also trace the execution of other processes (see the section "Execution Tracing" in Chapter 20). 49 Other Process Relationship Fields of a Process Descriptor P (1) struct task_struct * group_leader Process signal->pgrp PID of the group leader of P. pid_t tgid PID descriptor pointer of the group leader of P. of the thread group leader of P. signal->session PID of the login session leader of P. 50 Other Process Relationship Fields of a Process Descriptor P (2) struct list_head ptrace_children head of a list containing all children of P being traced by a debugger. The struct list_head ptrace_list The pointers to the next and previous elements in the real parent's list of traced processes (used when P is being traced). 51 More about ptrace_list and ptrace_children[1] /* ptrace_list/ptrace_children forms the list of my children that were stolen by a ptracer. */ 52 More about ptrace_list and ptrace_children[2] void __ptrace_link(task_t *child, task_t *new_parent) { if (!list_empty(&child->ptrace_list)) BUG(); if (child->parent == new_parent) return; list_add(&child->ptrace_list, &child->parent->ptrace_children); REMOVE_LINKS(child); child->parent = new_parent; SET_LINKS(child); } 53 The pidhash Table and Chained Lists In several circumstances, the kernel must be able to derive the process descriptor pointer corresponding to a PID. occurs, for instance, in servicing the kill( ) system call. This When process P1 wishes to send a signal to another process, P2, it invokes the kill( ) system call specifying the PID of P2 as the parameter. The kernel derives the process descriptor pointer from the PID and then extracts the pointer to the data structure that records the pending signals from P2's process descriptor. 54 Multiple Hash Tables The process descriptor includes fields that represent different types of PID, and each type of PID requires its own hash table. Due to the above reason, FOUR hash tables have been introduced to derive the process descriptor pointer corresponding to a PID. 55 The Four Hash Tables and Corresponding Fields in the Process Descriptor Hash Table Type PIDTYPE_PID Field Name pid Description PIDTYPE_TGID tgid PID of thread group leader process PIDTYPE_PGID pgrp PID of the group leader process PIDTYPE_SID session PID of the session leader process PID of the process 56 Array pid_hash The four hash tables are dynamically allocated during the kernel initialization phase, and their addresses are stored in the pid_hash array. static struct hlist_head *pid_hash[PIDTYPE_MAX]; pid_hash struct hlist_head * 57 Initialization of the Four Hash Tables -- pidhash_init(void) for (i = 0; i < PIDTYPE_MAX; i++) { pid_hash[i]=alloc_bootmem(pidhash_size* sizeof(*(pid_hash[i]))); if (!pid_hash[i]) panic("Could not alloc pidhash!\n"); for (j = 0; j < pidhash_size; j++) INIT_HLIST_HEAD(&pid_hash[i][j]); } 58 The Four Initialized Hash Tables pid_hash struct hlist_head * [0] [1] [2] [3] [2047] struct hlist_head [0] [1] [2] [3] [2047] struct hlist_head 59 PID and Table Index The PID is transformed into a table index using the pid_hashfn macro, which expands to: #define pid_hashfn(x) hash_long((unsigned long) x, pidhash_shift) The pidhash_shift variable stores the length in bits of a table index (11, in our example). The hash_long( ) function is used by many hash functions; on a 32-bit architecture it is essentially equivalent to: unsigned long hash_long(unsigned long val, unsigned int bits) { unsigned long hash = val * 0x9e370001UL; return hash >> (32 - bits); } Because in our example pidhash_shift is equal to 11, pid_hashfn yields values ranging between 0 and 211 - 1 = 2047. 60 Colliding As every basic computer science course explains, a hash function does not always ensure a one-to-one correspondence between PIDs and table indexes. Two different PIDs that hash into the same table index are said to be colliding. Linux uses chaining to handle colliding PIDs; each table entry is the head of a doubly linked list of colliding process descriptors. 61 Colliding Example The following figure illustrates a PID hash table with two lists. The processes having PIDs 2,890 and 29,384 hash into the 200th element of the table, while the process having PID 29,385 hashes into the 1,466th element of the table. 62 Properties of Data Structures Used in the PID Hash Tables The data structures used in the PID hash tables must keep track of the relationships between the processes. As an example, suppose that the kernel must retrieve all processes belonging to a given thread group, that is, all processes whose tgid field is equal to a given number. Looking in the hash table for the given thread group number returns just one process descriptor, that is, the descriptor of the thread group leader. To quickly retrieve the other processes in the group, the kernel must maintain a list of processes for each thread group. The same situation arises when looking for the processes belonging to a given login session or belonging to a given process group. 63 Properties of a Process Descriptor’s pids Field The PID hash tables' data structures solve all these problems, because they allow the definition of a list of processes for any PID number included in a hash table. The core data structure is an array of four pid structures embedded in the pids field of the process descriptor. 64 struct pid struct pid {int nr; struct hlist_node pid_chain; struct list_head pid_list; }; nr: The PID number. pid_chain: The links to the next and previous elements in the hash chain list. pid_list: The head of the per-PID list. 65 The PID Hash Tables 66 pidhash-related Functions do_each_task_pid(nr, type, task) while_each_task_pid(nr, type, task) find_task_by_pid_type(type, nr) find_task_by_pid(nr) attach_pid(task, type, nr) detach_pid(task, type) next_thread(task) 67