Linux Project 中央大學資工系 碩士二年級 江瑞敏 Outline • How to compile linux kernel • How to add a new system call • Some Projects Example and Way to Solve it – System Call Hooking by Module – Project about Memory – Project about Process Download Link • wget https://kernel.org/pub/linux/kernel/linux-2.6.18.tar.bz2 • tar xvf linux-2.6.18.tar.bz2 The Beginning of everything Compile Linux Kernel It is Hard? No, If you understand the concept The Basic Process • • • • • • 0. make mrproper 1. make oldconfig 2. make –j[n] 3. make modules_install 4. make install 5. reboot Do You Know What It Means? make mrproer • Clean up the environment • Will Remove almost everything, except…. make clean • Almost the same as make mrproper. make oldconfig • Use the configuration file the current kernel is using. • Some other alternative options. – Make menuconfig –… Is config File Important? Config file • Determine which kind of kernel you are compiling • Determine which modules you want the kernel to compile. • Misconfiguration will lead to kernel crash. make –j[n] • Compile the whole source code according to your configuration make modules_install • Install the modules into the necessary folder. – /lib/modules/`uname –r`/ make install • Install the image into the boot directory. • Sometimes, update grub is necessary. What Is System Call It’s a Bridge Between Device User Device Device Device Why System Call Pop Quiz : Write A Program To Print “Hello World” What You May Write What Actually Happened …. User Application Printf libc.so System Call Kernel Code Device Driver IO Device What If There Is No System Call Everything Will Be x86 instruction in and out Let’s Focus On … User Application Printf libc.so System Call Kernel Code Device Driver IO Device Magic int 0x80 Before We Talk Further, Let’s Talk About X86 Architecture X86 Architecture Is Interrupt Driven User Application Device Driver Kernel CPU 8259 PIC Device Device Device Device How The CPU Find The Address of The Device Driver Code Callback Mechanism Kernel Device Driver Interrupt Descriptor Table Device Driver … ….. Device Driver Device Driver CPU Physical Device 8259 PIC Device Device Device Device How About System Call Magic int 0x80 Interrupt Descriptor Table System Call Handler syscall_table ….. System Call Handler ….. 0x80 ….. System Call Handler CPU 8259 PIC int 0x80 Physical Device Device Device Device Device cpu User Application int 0x80 cs ds ss esp eip CPU … Stack Kernel … cpu int 0x80 User Application cs ds ss esp eip CPU … Get TSS GDT Stack TSS … cpu int 0x80 User Application cs ds ss esp eip CPU … Get TSS GDT Stack TSS … cpu int 0x80 User Application cs ds ss esp eip CPU … Get IDT IDT ENTRY(system_call) Stack 0x80 sys_call_table … cpu int 0x80 User Application cs ds ss esp eip CPU … Get IDT IDT ENTRY(system_call) Stack 0x80 sys_call_table … cpu int 0x80 User Application cs ds ss esp eip CPU … Get IDT IDT ENTRY(system_call) Stack 0x80 sys_call_table ss esp eflags cs eip … How To Add A System Call Add a System Call • • • • • 1. cd $kernel_src 2. cd arch/i386/kernel/syscall_table.S 3. …. .long sys_tee /* 315 */ .long sys_vmsplice .long sys_move_pages .long sys_project /* 318 */ • Kernel.org/pub/linux/kernel Add a System Call • cd linux/include/asm-i386/unistd.h • #define __NR_vmsplice 316 #define __NR_move_pages 317 #define __NR_project 318 #ifdef __KERNEL__ #define NR_syscalls 319 Add a System Call • cd linux/include/linux/syscalls.h • asmlinkage long sys_set_robust_list(struct robust_list_head __user *head, size_t len); asmlinkage long sys_project( int i ); #endif Add a System Call • • • • cd linux/kernel touch project.c Makefile obj-y = project.o sched.o fork.o exec_domain.o panic.o printk.o profile.o Add a System Call • Project.c • #include <linux/linkage.h> #include <linux/kernel.h> asmlinkage long sys_project( int i ){ printk( "Success!! -- %d\n", i ); return 0; } Add a System Call • Recompile linux kernel • Reboot • Create a new file “test.c” • #include<syscall.h> int main(){ syscall( 318, 2 ); return 0; } Add a System call • http://in1.csie.ncu.edu.tw/~hsufh/COURSES/F ALL2007/syscall.html About 64 bits • The Idea is the same • There are many online references • Therefore, I will not cover in this ppt. System Call Hooking by Module System Call Hooking … Usermode 程式呼叫 系統呼叫 NR_execve sys_call_table 57 正常的execve 程式碼 System Call Hooking 正常的execve 程式碼 … Usermode 程式呼叫 系統呼叫 NR_execve Hooking Code sys_call_table 58 System Call Hooking 正常的execve 程式碼 … Modified execve Usermode 程式呼叫 系統呼叫 NR_execve Hooking Code sys_call_table 59 Source code links • http://pastebin.com/rShUxvB5 • http://pastebin.com/KEJxgLGq Project about Memory Level 1: Dump the virtual address of a process Some Question U may Ask Where to Start? Maybe Add a New System Call 1. How to find the process you want? Process List • task_struct • for_each_process() • If u pay attention in class, these two are not stranger. 2. How about Virtual Address that is being used by the current process? The Data Structure • mm_struct • vm_area_struct lxr.linux.no How it looks like The rest is some basic programming skill Too easy, Let’s make it a little bit harder Level 2: Dump the physical frame that is associate with the virtual address. New Problem, New question How to transfer Virtual Address to Physical Address? Some Reminder and Hints Some Reminder and Hints Where is CR3? Now We Have CR3, Then? Calculate By Yourself or Something Smarter follow_page() Push Yourself More Level 3: Log these information to a file Ok, let’s type dmesg || grep “myproject” >> log.txt Dude Are you… …. From Kernel of course Can We Do That??? How to write file in User Mode • fd = open(filename, “w”); • write(ptr, string, strlen(string)); • close(fd); How about Kernel Mode open -> do_sys_open Write -> sys_write() Close -> sys_close() Is that all? The magic __user It tell kernel that the parameter should pass from user mode It’s a protection mechanism Final Step About this Project Level 4: Modify The PTE r/w flag from read/write to read http://in1.csie.ncu.edu.tw/~hsufh/COURSES/FAL L2012/linux_project1.html Structures of Page Directories And Page Tables Entries Wow, Looks Simple :D Basic Idea • 1. loop through the translation table of a process according to the virtual address. • 2. After finding the pte, change the read/write flag • 3. Done Code Implement • pte_wrprotect() Code Implement(Cont. ) • for(loop_count = addr; loop_count < end; loop_count+=PAGE_SIZE){ • pgd = pgd_offset(mm, loop_count); • if (pgd_none(*pgd)){ • printk("pgd none happened\n"); • continue; • } • pud = pud_offset(pgd, loop_count); • pmd = pmd_offset(pud, loop_count); • pte = pte_offset_map_lock(mm, pmd, loop_count, &ptl); • if(operation == 1){ • *pte = pte_mkwrite(*pte); • } else{ • *pte = pte_wrprotect(*pte); • } Result Result What!? Use Printk to Verify Printk Tell Us Two Things 1. we have change the pte r/w flag 2. only one entry being change back, other didn’t in most cases. Magic Happened ? Now, Imagine you are CPU What will happened when some process try to access a read only area Page Fault Happened The Question Becomes, How Linux Handle Page Fault U might Ask, What is Page Fault From CPU point of view 1. present flag of pgd or pte is clear. 2. code running in user mode attempts to write to a read only page. – More detailed check intel programmer manual. From Kernel Point of View 1. present flag is clear: • A. Access the first time. • B. Page is being swap out. 2. write to a read only page: • A. is a process really write to a read only page • B. is a page-fault optimization such as copy on write. How Does Linux Kernel Determine These Kind of Difference Well, First…. And This Then This What The FxxK……. This Time Let’s Look Closer Now We Know An Important Thing Linux Kernel Will Compare The vm_flag Some Useful Knowledge How Linux Implement COW Cow?? Moo ? • 1. COW refer to copy on write • 2. google and wiki are your friend • 3. how linux implement copy on write. – A. pte r/w flag disable – B. vm_flag & VM_WRITE == true Our project accidently match the above conditions! • 1. same page table entry of parent and child process point to the same pfn • 2. set r/w flag of both pte to read only • 3. when page fault happened, page fault handler will check the vm_flag of the current virtual address. • 4. if vm_flag has VM_WRITE, page fault handler will refer this situation as a COW condition. • 5. assign a new pfn with r/w flag enable if there are two pte point to it. Copy on Write linux implement parent pgd pte Pfn N Task_struct Pfn (N+1) Pfn (N+2) Physical address child Task_struct pgd pte A New Idea of The Project 1. Change PTE r/w flag as we just did 2. Change the vm_flag as well Code Implementation • • • • • • • • • • • • • • • • • • • down_write(&current->mm->mmap_sem); vma = find_vma(mm, addr); vm_start = vma->vm_start; vm_end = vma->vm_end; mask = VM_READ|VM_WRITE|VM_EXEC|VM_SHARED; new_flags = VM_READ; old_flags = vma->vm_flags; if(old_flags&VM_WRITE){ old_flags &= ~(VM_WRITE); new_flags |= old_flags; } else{ new_flags |= old_flags; } prot = protection_map[new_flags & mask]; vma->vm_flags = new_flags; vma->vm_page_prot = prot; up_write(&current->mm->mmap_sem); addr &= PAGE_MASK; change_pte(addr, end, operation); Result Where is the “press enter to continue” ? It’s time to use GDB Set a break point before syscall happened Seems like this time printf cause the error Here is the problem. Think Slowly Calling printf will need to push some parameters Recall From The Last Code • we have changed vm_flag for the whole vm_area_struct which means the entire block of linear address. • Address of the array is not always align to 4kb. Consider The following Conditions Start address align End address align Start Address Align End Address Not Align high End addr Total need 3 pages Start addr Area problem may occur Test_array low Start Address Not Align End Address Align high Area problem may occur End addr Total need 3 pages Start addr Test array low Start Address Not Align End Address Not Align high End addr Area problem may occur Total need 4 pages Start addr Area problem may occur Test_array low Our case high Assembly code: ….. Call syscall; Push $string; Call printf; The parameter is right here Since the page is RO. low Verify Our Thoughts (Test case 1) • Rewrite the user mode program. This time use malloc instead of local variable.(Heap instead of stack) • Char *test_array; • Test_array = (char *)malloc(ARRAY_SIZE) Test Case 1 Result Verify Our Thoughts (Test case 2) • • • • Char test1[0x2000]; Char test_array[ARRAY_SIZE]; Char test2[0x2000]; This can also bypass the conditions that I just mentioned. Test Case 2 Result Also work~~ How About Mprotect.c • 1. basically, the idea is the same. – A. change vm_flag – B. change pte r/w flag • 2. Some hints: – A. Strongly recommend reading Text Book • Chapter 8: Memory Management • Chapter 9: Process Address Space – B. code to change vma_flag is in mprotect_fixup(). – C. the code to loop through the translation table starts from change_protection(….) -> change_pud(….) -> change_pmd(…..) -> change_pte_range(…..) Full Source • Level 1 and 2 : http://pastebin.com/wEVLaQyg • Level 3: http://pastebin.com/HFW8WTN5