The kernel’s task list Introduction to process descriptors and their related data-structures for Linux kernel version 2.6.22 Multi-tasking • Modern operating systems allow multiple users to share a computer’s resources • Users are allowed to run multiple tasks • The OS kernel must protect each task from interference by other tasks, while allowing every task to take its turn using some of the processor’s available time Stacks and task-descriptors • To manage multitasking, the OS needs to use a data-structure which can keep track of every task’s progress and usage of the computer’s available resources (physical memory, open files, pending signals, etc.) • Such a data-structure is called a ‘process descriptor’ – every active task needs one • Every task needs its own ‘private’ stack What’s on a program’s stack? Upon entering ‘main()’: • A program’s exit-address is on user stack • Command-line arguments on user stack • Environment variables are on user stack During execution of ‘main()’: • Function parameters and return-addresses • Storage locations for ‘automatic’ variables Entering the kernel… A user process enters ‘kernel-mode’: • when it decides to execute a system-call • when it is ‘interrupted’ (e.g. by the timer) • when ‘exceptions’ occur (e.g. divide by 0) Switching to a different stack • Entering kernel-mode involves not only a ‘privilege-level transition’ (from level 3 to level 0), but also a stack-area ‘switch’ • This is necessary for robustness: e.g., user-mode stack might be exhausted • This is desirable for security: e.g, privileged data might be accessible What’s on the kernel stack? Upon entering kernel-mode: • task’s registers are saved on kernel stack (e.g., address of task’s user-mode stack) During execution of kernel functions: • Function parameters and return-addresses • Storage locations for ‘automatic’ variables Supporting structures • So every task, in addition to having its own code and data, will also have a stack-area that is located in user-space, plus another stack-area that is located in kernel-space • Each task also has a process-descriptor which is accessible only in kernel-space A task’s virtual-memory layout Privilege-level 0 Kernel space User-mode stack-area User space Privilege-level 3 Shared runtime-libraries Task’s code and data process descriptor and kernel-mode stack The Linux process descriptor pagedir[] task_struct Each process descriptor contains many fields and some are pointers to other kernel structures state *stack flags *pgd *mm user_struct exit_code *user pid which may themselves include fields that point to structures mm_struct files_struct *files *parent *signal signal_struct Something new in 2.6 • Linux uses part of a task’s kernel-stack page-frame to store ‘thread information’ • The thread-info includes a pointer to the task’s process-descriptor data-structure Task’s kernel-stack struct task_struct Task’s process-descriptor 8-KB Task’s thread-info page-frame aligned Tasks have ’states’ From kernel-header: <linux/sched.h> • • • • • • • #define TASK_RUNNING #define TASK_INTERRUPTIBLE #define TASK_UNINTERRUPTIBLE #define TASK_STOPPED #define TASK_TRACED #define TASK_NONINTERACTIVE #define TASK_DEAD 0 1 2 4 8 64 128 Fields in a process-descriptor struct task_struct { volatile long state; void *stack; unsigned long flags; struct mm_struct *mm; struct thread_struct *thread; pid_t pid; char comm[16]; /* plus many other fields */ }; Finding a task’s ‘thread-info’ • During a task’s execution in kernel-mode, it’s very quick to find that task’s thread-info object • Just use two assembly-language instructions: movl andl $0xFFFFF000, %eax %esp, %eax Ok, now %eax = the thread-info’s base-address There’s a macro that implements this computation Finding task-related kernel-data • Use a macro ‘task_thread_info( task )’ to get a pointer to the ‘thread_info’ structure: struct thread_info *info = task_thread_info( task ); • Then one more step gets you back to the address of the task’s process-descriptor: struct task_struct *task = info->task; The kernel’s ‘task-list’ • • • • • Kernel keeps a list of process descriptors A ‘doubly-linked’ circular list is used The ‘init_task’ serves as a fixed header Other tasks inserted/deleted dynamically Tasks have forward & backward pointers, implemented as fields in the ‘tasks’ field • To go forward: task = next_task( task ); • To go backward: task = prev_task( task ); Doubly-linked circular list next_task init_task (pid=0) prev_task … newest task Demo • We can write a module that lets us create a pseudo-file (named ‘/proc/tasklist’) for viewing the list of all currently active tasks • Our ‘tasklist.c’ module shows the name and process-ID of each task, along with that task’s current ‘state’ (0, 1, 2, 4, 8,…) • Use the command: $ cat /proc/tasklist to display a complete list of the active tasks Maybe a big /proc file… • We can’t know ahead of time how many tasks are active in our system – this will depend on many varying factors, such as who else is logged in, which commands have been issued, whether we’re using text-mode console or graphical desktop • So it’s perfectly possible our pseudo-file might ‘overflow’ its kernel-supplied buffer! How to avoid buffer-overflow • Our module’s ‘get_info()’ callback-function has four parameter-values passed to it by the kernel: • • • • char *buf char **start off_t offset int buflen - address of a small kernel buffer - address of a pointer variable - current offset of file-pointer - size of the kernel buffer • The initial conditions are: offset == 0 and *start == NULL • Kernel’s behavior will vary if we modify *start Normal case • We expect the ‘/proc’ file to deliver a small amount of text-data (not more than would fit in the kernel-supplied buffer (e.g., 3KB) • So we make no change to ‘*start’ • Then kernel will deliver the data it finds in the buffer it had supplied to ‘get_info()’ • The kernel will not call ‘get_info()’ again (unless our file is closed and reopened) Alternative case • Our ‘get_info()’ function modifies the value of the (initially NULL) ‘*start’ pointer – for example, maybe assigning it the address of some buffer we’ve allocated, or even assigning the address of the kernel-buffer: *start = buf; • In this case, the kernel will again call our module’s ‘get_info()’ function, provided we returned a nonzero function-value before! The benefit • Knowing about this alternative option, we can design our ‘get_info()’ function so that it delivers a big amount of data in several small-size chunks, never overflowing the size-limitations on the kernel’s buffer • We just need to think carefully about the differing senarios under which ‘get_info()’ will be repeatedly called First pass • The value of ‘offset’ will be zero • We set *start to a buffer-address where we place a positive number of data-bytes • Kernel delivers those bytes to the ‘reader’, taking them from the *start address, then advances the file-pointer by that amount • Kernel calls our ‘get_info()’ again, but with a non-zero ‘offset’ value this time! Final time • When our ‘get_info()’ function has finally finished delivering all the desired data to the file’s ‘reader’, and still we receive yet another ‘get_info()’ call, then we simply return a function-value equal to zero, telling the kernel that the data has been exhausted -- and so not to call again! Our implementation struct task_struct *task; // ‘global’ variables’ values remembered int my_get_info( char *buf, char **start, off_t offset, int buflen ) { int len = 0; if ( offset == 0 ) // our first time through this function { task = &init_task; // start of circular linked-list } else if ( task == &init_task ) return 0; // our final pass // put some data into the kernel-supplied buffer len += sprintf( buf+len, “pid=%d \n”, task->pid ); *start = buf; // tell kernel where to find data, and to call again task = next_task( task ); return len; } // advance to next node of circular list // and tell kernel how far to advance In-class exercise #1 • Different versions of the 2.6 Linux kernel use slightly different definitions for the task-related kernel data-structures (e.g., the 2.6.10 kernel used a smaller-sized ‘thread-info’ structure than 2.6.9 kernel did) • So, by using the C ‘sizeof’ operator, can you quickly create an LKM that will show us: – the size of a ‘task_struct’ object (in bytes)? – the size of a ‘thread_info’ object (in bytes)? ‘Kernel threads’ • Some tasks don’t have a page-directory of their own – because they don’t need one • They only execute code, and access data, that resides in the kernel’s address space • They can just ‘borrow’ the page-directory that belongs to another task • These ‘kernel thread’ tasks will store the NULL-pointer value (i.e., zero) in the ‘mm’ field of their ‘task_struct’ descriptor In-class exercise #2 • Can you modify our ‘tasklist.c’ module so it will display a list of only those tasks which are ‘kernel threads’? (i.e., task->mm == 0) • How many ‘kernel threads’ on your list?