‘Dynamic’ kernel patching How you could add your own system-calls to Linux without editing and recompiling the kernel System calls • System Calls are the basic OS mechanism for providing privileged kernel services to application programs (e.g., fork(), clone(), execve(), read(), write(), signal(), getpid(), waitpid(), gettimeofday(), setitimer(), etc.) • Linux implements over 300 system calls • To understand how system calls work, we can try creating one of our own design ‘Open Source’ philosophy • Linux source-code is publicly available • In principle, anyone could edit the sources to add their own new functions into Linux • In practice, it is inconvenient to do this • The steps needed involve reconfiguring, recompiling, and reinstalling your kernel • For novices these steps are treacherous! • Any error risks data-loss and down-time Alternative to edit/recompile • Linux modules offer an alternative method for modifying the OS kernel’s functionality • It’s safer -- and vastly more convenient – since error-recovery only needs a reboot, and minimal system knowledge suffices • The main hurdle to be overcome concerns the issue of ‘linking’ module code to some non-exported Linux kernel data-structures Invoking kernel services user-mode (restricted privileges) kernel-mode (unrestricted privileges) application program installable module call call ret standard runtime libraries ret int 0x80 Linux kernel iret The system-call jump-table • There are approximately 300 system-calls • Any specific system-call is selected by its ID-number (it’s placed into register %eax) • It would be inefficient to use if-else tests or even a switch-statement to transfer to the service-routine’s entry-point • Instead an array of function-pointers is directly accessed (using the ID-number) • This array is named ‘sys_call_table[]’ Assembly language (.data) .section .data sys_call_table: .long sys_restart_syscall .long sys_exit .long sys_fork .long sys_read .long sys_write // …etc (from ‘arch/i386/kernel/entry.S’) The ‘jump-table’ idea sys_call_table 0 sys_restart_syscall 1 sys_exit 2 sys_fork 3 sys_read 4 sys_write 5 sys_open 6 7 8 sys_close …etc… .section .text Assembly language (.text) .section .text system_call: // copy parameters from registers onto stack… call sys_call_table(, %eax, 4) jmp ret_from_sys_call ret_from_sys_call: // perform rescheduling and signal-handling… iret // return to caller (in user-mode) Changing the jump-table • To install our own system-call function, we just need to change an entry in the Linux ‘sys_call_table[]’ array, so it points to our own module function, but save the former entry somewhere (so we can restore it if we remove our module from the kernel) • But we first need to find ‘sys_call_table[]’ -- and there are two easy ways to do that Finding the jump-table • Older versions of Linux (prior to 2.4.18) used to ‘export’ the ‘sys_call_table[]’ as a global symbol, but current versions keep this table’s address private (for security) • But often during kernel-installation there is a ‘System.map’ file that gets put into the ‘/boot’ directory and – assuming it matches your compiled kernel – it holds the kernel address for the ‘sys_call_table[]’ array Using ‘uname’ and ‘grep’ • You can use the ‘uname’ command to find out which kernel-version is running: $ uname -r • Then you can use the ‘grep’ command to find ‘sys_call_table’ in your System.map file, like this: $ grep sys_call_table /boot/System.map-2.6.22.5cslabs The ‘vmlinux’ file • Your compiled kernel (uncompressed) is left in the ‘/usr/src/linux’ directory • It is an ELF-format (executable) file • It contains .text and .data sections • You can examine your ‘vmlinux’ kernel with the ‘objdump’ system-utility • You can pipe the output through the ‘grep’ utility to locate the ‘sys_call_table’ symbol Executable versus Linkable ELF Header ELF Header Program-Header Table (optional) Program-Header Table Section 1 Data Segment 1 Data Section 2 Data Segment 2 Data Section 3 Data Segment 3 Data … Section n Data … Segment n Data Section-Header Table Section-Header Table (optional) Linkable File Executable File Where is ‘sys_call_table[ ]’? • This is how you use ‘objdump’ and ‘grep’ to find the ‘sys_call_table[]’ address: $ cd /usr/src/linux $ objdump –t vmlinux | grep sys_call_table Exporting ‘sys_call_table’ • Once you know the address of your kernel’s ‘sys_call_table[]’, you can write a module to export that address to other modules, e.g.: // declare global variable unsigned long *sys_call_table; EXPORT_SYMBOL(sys_call_table); int init_module( void) { sys_call_table = (unsigned long *)0xC0251500; return 0; } Avoid hard-coded constant • You probably don’t want to ‘hard code’ the sys_call_table’s value in your module – if you ever recompile your kernel, or use a differently configured kernel, you’d have to remember to edit your module and then recompile it – or risk a corrupted system! • There’s a way to suply the required value as a module-parameter during ‘insmod’ Module paramerers char *svctable; // declare global variable module_param( svctable, charp, 0444 ); // Then you install your module like this: $ /sbin/insmod myexport.ko svctable=c0251500 // Linux will assign the address of your input string “c0251500” to the ‘svctable’ pointer: simple_strtoul() • There is a kernel function you can use, in your ‘init_module()’ function, that will convert a string of hexadecimal digits into an ‘unsigned long’’: int init_module( void ) { unsigned long myval; myval = simple_strtoul( svctable, NULL, 16 ); sys_call_table = (unsigned long *)myval; return 0; } Shell scripts • It’s inconvenient – and risks typing errors – if you must manually search ‘vmlinux’ and then type in the sys_call_table[]’s address every time you want to install your module • Fortunately this sequence of steps can be readily automated – by using a shell-script • We have created an example: ‘myscript’ shell-script format • First line: #!/bin/sh • Some assignment-statements: version=$(uname –r) mapfile=/boot/System.map-$version • Some commands (useful while debugging) echo $version echo $mapfile The ‘cut’ command • You can use the ‘cut’ operation on a line of text to remove the parts you don’t want • An output-line from the ‘grep’ program can be piped in as a input-line to ‘cut’ • You supply a command-line argument to the ‘cut’ program, to tell it which parts of the character-array you wish to retain: – For example: cut –c0-8 – Only characters 0 through 8 will be retained Finishing up • Our ‘myscript’ concludes by executing the command which installs our ‘myexport.o’ module into the kernel, and automatically supplies the required module-parameter • If your ‘/boot’ directory doesn’t happen to have the ‘System.map’ file in it, you can extract the ‘sys_call_table[]’ address from the uncompressed ‘vmlinux’ kernel-binary The ‘objdump’ program • The ‘vmlinux’ file contains a Symbol-Table section that includes ‘sys_call_table’ • You can display that Symbol-Table using the ‘objdump’ command with the –t flag: $ objdump –t /usr/src/linux/vmlinux • You can pipe the output into ‘grep’ to find the ‘sys_call_table’ symbol-value • You can use ‘cut’ to isolate the address Which entry can we change? • We would not want to risk disrupting the normal Linux behavior through unintended alterations of some vital system-service • But a few entries in ‘sys_call_table[]’ are no longer being used by the newer kernels • If documented as being ‘obsolete’ it would be reasonably safe for us to ‘reuse’ an array-entry for our own purposes • For example: system-call 17 is ‘obsolete’ ‘newcall.c’ • We created this module to demonstrate the ‘dynamic kernel patching’ technique • It installs a function for system-call 17 • This function increments the value stored in a variable of type ‘int’ whose address is supplied as a function-argument • We wrote the ‘try17.cpp’ demo to test it! Recently… an extra obstacle • Some recent versions of the Linux kernel (including ours in the classroom and labs) have placed the ‘sys_call_table[]’ (as the default configuration-option) in ‘read-only’ memory within kernel-space, despite the already existing protections of ‘ring 0’ • What this achieves is creation of an added obstacle to alterations by privileged-code page-frame attributes virtual memory address of our ‘sys_call_table[]’ array 0xC0251500 = 2 1 1100 0000 00 10 0101 0001 0101 0000 0000 sys_ call_ table 0 S R / / P U W frame attributes CR3 Page-Directory Page-Tables Page-Frame We cannot modify entries in ‘sys_call_table[]’ unless its page-frame is ‘writable’ Tweak page-frame attributes • Our ‘newcall.c’ module needs to be sure it can modify entry 17 in ‘sys_call_table[]’ • So it locates the page-table entry for the page-frame containing ‘sys_call_table[]’ and sets its ‘writable’ bit to be ‘TRUE’ • But it preserves the previous value of that entry, so it can be restored if we remove our ‘newcall.ko’ object from the kernel In-class exercise #1 • Write a kernel module (named ‘unused.c’) which will create a pseudo-file that reports how many ‘unimplemented’ system-calls are still available. The total number of locations in the ‘sys_call_table[]’ array is given by a defined constant: NR_syscalls so you can just search the array to count how many entries match ‘sys_ni_syscall’ (it’s the value found initially in location 17)