USER ACTIVITIES MONITORING SYSTEM USING LKM Bhaumik Patel B.E., Sardar Patel University, INDIA, 2008 PROJECT Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE at CALIFORNIA STATE UNIVERSITY, SACRAMENTO FALL 2011 USER ACTIVITIES MONITORING SYSTEM USING LKM A Project By Bhaumik Patel Approved by: __________________________________, Committee Chair Jinsong Ouyang, Ph.D. __________________________________, Second Reader Chung-E Wang, Ph.D. __________________________________ Date ii Student: Bhaumik Patel I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the Project. __________________________, Graduate Coordinator Nikrouz Faroughi, Ph.D. Department of Computer Science iii ________________ Date Abstract of USER ACTIVITIES MONITORING SYSTEM USING LKM by Bhaumik Patel Security is one of the major challenge while single machine is shared among multiple users. Linux is the operating system which supports multiple users. All the users can have access to different files in the system and they can access the file using local machine or the network connection. Any inappropriate action by some user can cause system failure or some unexpected troubles. All the activities by different users must be monitored in order to identify the exact reason for system failure and the user who is responsible for that. In the operating system, system call is the only window for a user process to get into the kernel and access different resources provided by the kernel. Linux Security Module (LSM) is a Loadable Kernel Module(LKM) which intercept the file i/o system calls and network i/o system calls in order to log the valuable information. It adds a layer between the user process and actual system call by replacing the actual system call with spy system call. The LSM supports 32 bit machine and older version of the kernel. As 64 bit machines are common today, LSM is required to be ported on a 64 bit machine with the latest version of kernel source. To port the actual LSM on latest hardware and latest iv kernel, changes are required and LSM need to be upgraded based on current system call structure. User activities monitoring system using LKM includes upgraded LSM as system layer utility. This is required to hack the file i/o and network i/o and to generate the log files based on gathered information. And as application layer utility it also includes automation system , which required to filter the data from log file and insert that data into the database. It also has a GUI based web interface to query the data in the database and to generate the report for system administrator. The entire system will be really helpful to monitor user activities both on local machine as well as on the network. Using this tool, the administrator of the system can trace the file i/o and network i/o, so in case the system goes down, the admin can investigate about the activities done by different users. And can explore the actual reason for crash. , Committee Chair Jinsong Ouyang, Ph.D. _________________________ Date v DEDICATION Dedicated to my loving parents who inspired me to work hard and my brother for his constant support. vi ACKNOWLEDGMENTS I would like to thank everyone who encourage and motivate me throughout Master’s Project. I am grateful to Dr. Jinsong Ouyang for his constant guidance and useful advices all the time. He helped me a lot in identifying different problems and to build the solution step by step. He also provided me reading materials like books, published papers etc. to explore the Linux kernel even more. At the same time, I would also like to thank Prof. Chung-E Wang for being a driving force for my interest in different algorithms and providing me continuous support. Furthermore, I would also like to thank the entire Linux Kernel Developers community. This community has helped me a lot by providing me guidance on kernel module development. Linux developers around the world are really active at providing answers in discussion forum and it always helped me to solve different problems. In the end, I would like to thank my parents and all my friends for giving me full support and motivation in completion of the project. vii TABLE OF CONTENTS Page Dedication .................................................................................................................... vi Acknowledgments....................................................................................................... vii List of Tables ............................................................................................................... ix List of Figures ...............................................................................................................x Chapter 1. INTRODUCTION ....................................................................................................1 1.1 Overview ..........................................................................................................1 1.2 Objectives ........................................................................................................2 2. LOADABLE KERNEL MODULE (LKM) ..............................................................4 2.1 What is LKM? .................................................................................................4 2.2 Basic Structure of LKM ...................................................................................5 2.3 System Call ......................................................................................................7 2.4 LSM (Linux Security Module) ........................................................................8 2.5 Porting .............................................................................................................9 3. KERNEL INSTRUMENTATION FOR MONITORING FILE I/O AND NETWORK I/O…... ................................................................................................11 3.1 System part overview ....................................................................................11 3.2 Basic pseudo code for LKM ..........................................................................11 3.3 Locating System call Table ..........................................................................12 viii 3.4 Overwriting the addresses of system call ......................................................17 3.5 Writing a new system call ..............................................................................19 3.6 Cleanup part of the module ............................................................................28 3.7 Output of phase-I ...........................................................................................29 4. A WEB-BASED APPLICATION FOR MONITORING FILE I/O AND NETWORK I/O ......................................................................................................30 4.1 Why filtering is required? ..............................................................................30 4.2 Automation Schema .......................................................................................31 4.3 Filtering script ................................................................................................32 4.4 Insert data script .............................................................................................35 4.5 Database design .............................................................................................37 4.6 Web interface ................................................................................................37 5. SUMMARY ............................................................................................................39 5.1 Summary ........................................................................................................39 6. FUTURE WORK ....................................................................................................40 6.1 Future work ....................................................................................................40 References ................................................................................................................... 41 ix LIST OF TABLES Page Table 1 System calls related to file i/o and network i/o ................................................2 Table 2 LKM commands and their functionality ..........................................................6 Table 3 Comparison of system call structure between 32 and 64 bit machine ...........10 Table 4 System calls and corresponding system call numbers ...................................13 Table 5 Information captured by hacking file i/o and network i/o .............................37 x LIST OF FIGURES Page Figure 1 Overall structure of system and position of system call ................................. 7 Figure 2 Automation schema ...................................................................................... 31 Figure 3 Screenshot of Form to query the database .................................................... 38 Figure 4 Screenshot of result of query ........................................................................ 38 xi 1 Chapter 1 INTRODUCTION 1.1 Overview Linux is a multi-user operating system. Many different users can share a single Linux machine and can access the machine locally or through the network. In the Linux, all the file system requests and network requests can be satisfied by system calls. System calls are the only window for user processes to enter into the kernel and use the shared resources in a proper manner. Each request will invoke the corresponding system call and the kernel provides specific service for that system call. To provide the operating system level security, we can alter the code of system call and customize it to provide more support for our own code. So every time a request for a file i/o or network i/o occurs, the execution flow will pass through modified system call.In our customized system call, we can gather the required information for security purpose in between. For file i/o, the information like, who is asking for file access, what is the purpose of request (reading or writing), time stamp for the request, the absolute path of file which is requested etc. can be very useful to log for file system security. Foe network i/o, the information like who is invoking the network request, time stamp, filename of the involved file, the IP address of source and destination, the port number of source and destination etc. can be useful to log. 2 To hack the system call, LKM (Loadable Kernel module) can be really helpful. Using it, we can add our own system calls to the kernel as well as we also can modify the existing system call. Linux security module (LSM) is itself an LKM, which hacks the different file i/o system calls and network i/o system calls. LSM is originally developed by Tushar Dave in year 2007. LSM was designed for 32 bit machine and it supports the older version of Linux kernel. Nowadays machines are 64 bit and Linux kernel source is also very advanced. LSM need to be ported on latest Linux kernel source and 64 bit machine. 1.2 Objectives Writing a Loadable Kernel Module (LKM) for a 64 bit machine to hack the file i/o and network i/o system calls. This will make each request to actual system call will go through the replaced system call. Within the replaced system call we can extract required information and write it into a log file. Two separate log files will be created, one for file i/o and another for network i/o. Corresponding system calls are as follows. Table 1 System calls related to file i/o and network i/o File System Calls Network System Calls Open Connect Read Accept Write Sendto Recvfrm 3 Once all the above mentioned system calls are hacked and LKM starts generating the log files, we should have enough data for user activities inside the log file that we wanted to gather. Now we can use some automation to filter the log file data. As log file will have all the information related to all the request to those system calls, we might not be interested in looking entire log. And also the size of log files can be very large. So using filtering capability we can filter the only information that we are interested in. And after extracting all those information, we can remove the log file from the disk. A mechanism is also required to query the filtered log file. For that, another part of automation script will keep dumping the filtered log file data into the database periodically. That way we can have data maintained very well within the database. And on top of that, some web interface is also required with GUI to query the data and to generate the reports for system administrator. 4 Chapter 2 LOADABLE KERNEL MODULE (LKM) 2.1 What is LKM? Loadable kernel module is a way of expanding the kernel source code. There are two ways of changing the code in Linux kernel. [1] Actually changing the kernel code and rebuilding the kernel [2] Loadable kernel module The advantage of LKM over the first method is, the kernel doesn’t require to be recompiled and rebuild. LKM can be dynamically loaded and unloaded from the kernel without actually rebuilding the kernel code. It can be used to achieve many of the advantages of microkernel without additional performance penalties. Almost every different component of Linux like device drivers, system call related modules, executable formats and so on are eligible to be written as kernel module. LKM are also supported in most of other commercial operating systems like Microsoft Windows, FreeBSD, Mac-OS etc. LKMs have a lot of advantages over changing the actual base kernel. LKM can also help us to diagnose system problem. LKM can save the memory because it doesn’t have to be inside the memory all the time, it has to be within the memory only if some other process is using the LKM inside the OS. After loading a module into memory kernel will maintain a usage count for that module. The count indicates how many other processes 5 are currently using the module. A module can be unlinked if the usage count is 0. Multiple modules can also have dependencies among each other. So if module B is dependent on module A, then module A should be loaded prior to module B. 2.2 Basic Structure of LKM Pseudo code for LKM is given below. #include Header Files int init_module() { Code to perform the operation inside the kernel; } void cleanup_module() { Undo everything done in init_module(); } Every kernel module has two parts: entry and exit. Here in the pseudo code, function init_module() is the entry part. When the module is loaded inside the kernel, this function will be invoked. Inside this function we can add the code that performs various operations inside the kernel. It may add new capabilities or it can modify the existing capabilities. In the same code, we also have exit part which is cleaup_module() function. Whenever the module is unloaded from the kernel, this function will be invoked. Usually this part will have the code to do cleanup, means whatever we have done in init part is 6 been undone in the cleanup part. As the LKM is going to use system resources, it should be written very carefully and both parts, init and cleanup should be exact reverse. So if in init part is it using some resources then in the cleanup part is should release all the acquired resources. Below Few commands are given related to LKM that can be invoked using Linux terminal. Table 2 LKM commands and their functionality Command Functionality Insmod To load the module into kernel Rmmod To remove(unload) the module from kernel Modeprobe Automatically detect the dependencies and insert/remove module accordingly Depmod Determine interdependencies between multiple LKMs Lsmod To list all the currently loaded modules Modinfo Print out the information about one or more module 7 2.3 System Call user processes Interrupts System calls Exception user mode kernel Mode Kernel Hardware Figure 1 overall structure of system and position of system call As we can see from the figure 2.1, the system call is the only passage for user process to get inside the kernel. User processes don’t have direct access to the kernel resources. But using the system call they can ask the kernel to execute on behalf of their own to get access to underlying resources. In the Linux, the entire file system is also considered as a valuable resource and no user process is allowed to access the file system directly. The integrity and consistency of the file system is an important aspect and each request to file is monitored by the kernel. There are mainly three system calls which deal with file system. They are open(), read() and write(). So every user process need to invoke open() system call in order to request for some specific file. And only after successfully opening 8 the file, user process can invoke the read() to read from the file and the write() system call to write into that file. By hacking the open() system call, we can add our own system call to handle the open requests. So that, every time a user process wants to read something from a file, it will invoke open() system call but instead of open(), our own system call will be executed. In our own system call we can take the advantage of user information and can extract the additional data from it. We can store the data into log file and then can continue by invoking the actual system call. The Linux kernel also has many system calls for network requests. Some of them are connect(), accept(), sendto(), recvfrm() etc. Connect() is the system call invoked by the client process running on another machine in the network. To accept the client connection, the server process will invoke the accept() system call. After the connection has been established successfully, both the processes can communicate with each other using sendto() and recvfrm() system call. We can also use the same trick as above, to hack the network call as well. Here also after extracting the information from the requests, we can invoke the actual system calls, so that way user processes running on the network will not notice the replacement of system call. 2.4 LSM (Linux Security Module) Linux security module is the similar effort done in past. LSM supports the 32 bit operating system and older version of kernel. This was developed in 2007. So after that, 9 we have many versions of kernel source. Nowadays, people have started using 64 bit machine. So 64 bit machines are common and LSM needs to be updated to support the latest kernel source and 64 bit machine as well. 2.5 Porting Porting required the LKM for system call hacking to be modified. It should be updated to make it work on latest 64 bit hardware and also with latest kernel source code. There are several changes required for LKM to work on 64bit machine. The header file unistd.h is required to access the system call numbers. When we include this file, based on the underlying architecture of the machine it will include the unistd_32.h for 32 bit machine and unistd_64.h for 64 bit machine. In our case, unistd_64.h will be loaded. Both the files have different system call numbers corresponding to actual system call. For example, __NR_open has value 5 in unistd_32.h and value 2 in unistd_64.h . But that will not affect LKM because in LKM, we are going to access the open system call number as __NR_open only. The structure of the system calls involved in network i/o in both the files are different. In the previous implementation of LKM, to log the network i/o, the only system call that needs to be hacked was sys_socketcall(). And the system call number for sys_socketcall() was 102 which is indicated by __NR_socketcall. All the network related system call request are redirected to sys_socketcall() and in it’s body, there is a big switch case which identifies 10 all the specific system call request and invokes corresponding function to satisfy the request. In 64 bit machines, the corresponding file unistd_64.h doesn’t have such a variable declared. So for 64 bit machine, all the network related system calls need to be hacked separately. Table 3 Comparison of system call structure between 32 and 64 bit machine 32 bit machine sys_socketcall(int call, unsigned long *args) { Switch(call) 64 bit machine sys_connect(….) { } { sys_accept(….) case SYS_CONNECT: call sys_connect() { case SYS_ACCEPT : call sys_accept() } case SYS_SENDTO : call sys_sendto() sys_sendto(…) …. { } } As the kernel developers provide the new version every time, they may add new set of system calls or they also can modify the current structure of the system calls. So porting is always necessary and we can ensure that we can get same facilities even though the kernel has been changed. 11 Chapter 3 KERNEL INSTRUMENTATION FOR MONITORING FILE I/O AND NETWORK I/O 3.1 System part overview The system part of User activities monitoring system using LKM includes, writing an actual LKM to hack different file i/o and network i/o system calls and generating the log files for both. After writing the actual module, we need to load the module on Linux machine. Once the module is loaded, it will replace the definition of the current system call and add the customized system call at that place. All the requests will be diverted to our own system call and after extracting the information from the request, the module will put that request into log file. Log file names are given based on the current date. So if the date is 10/23/2011, then the module will generate two log files for file i/o as well as network i/o with the name “10_23_2011”. So we can search for today’s date named file to analyze the file i/o and network i/o for today. 3.2 Basic pseudo code for LKM #include header files init() { Locate the system call table (base address) ; Get the address of different system calls; Save the original addresses of all the sytem call; 12 Overwrite the addresses with our new_system_calls; } cleanup() { Retrieve the original addresses from where it were stored and overwrite it back in system call table; (undo everything done in init) } new_system_call(parameters) { log the information into logfiles; invoke the actual system call; } 3.3 Locating System call Table By following the pseudo code, first we must have to locate the system call table inside the operating system. The system call table is the table, which maps the system call number to the corresponding address of that system call. To invoke the system call, user process need to generate the software interrupt and passes the system call number to the kernel for the look up purpose. So when a specific system call is invoked , kernel will use 13 the system call number to do look up inside the system call table. Kernel will search for the corresponding entry for that system call number and extract the address of that particular system call. Then execution flow will jump to that address to execute the system call. Once the system call got executed, based on the nature of particular system call the switching will be done from kernel mode to user mode. The mapping from system call to the corresponding number can be found in unistd.h file inside the system. For 32 bit machine the file is named as unistd_32.h and for 64 bit machine it is names as unistd_64.h . The following table shows some of the system call and corresponding system call number for 64 bit OS. These numbers are defined in unistd_64.h file in the Linux source. Table 4 System calls and corresponding system call numbers System call System call Number read() 0 write() 1 open() 2 close() 3 connect() 42 accept() 43 sendto() 44 recvfrm() 45 14 To locate the system call Table, there are three methods: [1] using exported symbol sys_call_table In the Linux kernel version 2.4, they have one exported symbol called “sys_call_table” .User can use any of the exported symbols directly. So in the kernel 2.4, we can easily read the value of symbol sys_call_table and can get the address of system call table. The symbol sys_call_table is no more exported in Linux kernel for security reasons. But here we are using kernel 2.6 , so we cannot use this method. For kernel 2.6, we can user other two methods. [2]Brute force scanning of kernel memory range For this we need to write the small LKM . This LKM will scan the kernel memory. For 32 bit OS, the kernel memory range is from 0xc0000000 to 0xd0000000 . So by scanning the entire kernel memory, we will try to find the location of system call table. #define START_MEM 0xc0000000 #define END_MEM 0xd0000000 unsigned long *syscall_table; unsigned long **find() { unsigned long **sctable; unsigned long int i=START_MEM ; while(i<END_MEM) { 15 sctable = (unsigned long **)i; if(sctable[__NR_close]==(unsigned long *)sys_close) { return &sctable[0]; } i+=sizeof(void *); } } static int init(void) { printk("\nModule starting...\n"); syscall_table=(unsigned long *) find(); if(syscall_table!=NULL) { printk("systemcall table found at %x\n", (unsigned)syscall_table); } return 0; } static void exit(void) { printk("Module exiting\n"); 16 return; } Here inside the init method , we are calling the find() function. The function is scanning the kernel memory range and trying to find the address of sys_call_table. To do that, here we are using sys_close symbol, which is still exported in kernel 2.6 . That symbol represent the system call number of close() system call. So we are finding the base address for which, the address of close() system call relative the base address is the value of sys_close symbol. As soon as the condition is satisfied, we can know that actual base address. [3] Using ‘grep’ commad We can use grep command, to search for sys_call_table key word from the file /boot/System.map-2.6.35-22-generic’. Here 2.6.35 is the kernel version. The grep command will give us the address of the sys_call_tabel. It will simple print a line on the console and that line includes the address. The exact command and result of the command for 64 bit and 32 bit is given as per the following . 64 Bit: bk@ubuntu:~/lkm$ grep sys_call_table /boot/System.map-2.6.35-22-generic ffffffff81600300 R sys_call_table 32 Bit: bk@ubuntu:~/lkm$ grep sys_call_table /boot/System.map-2.6.35-22-generic c05d2180 R sys_call_table 17 This method works fine for both 32 bit and 64 bit operating system. This is more reliable method as previous method will not work in case if the kernel developer decides not to export the sys_close symbol in the future. 3.4 Overwriting the addresses of system call Once the system call table is located successfully, we have the base address of system call table. And using this base address, we can get the addresses of all the system calls supported by operating system inside the system call table. After getting the address of system calls, first we need to save the original addresses so we can retrieve it later when we are executing the cleaning part. As cleaning part will restore the original system call table, we required to have the original addresses of system call from system call table. The statement below assigns sys_call_table variable the value of base address of system call table. Using that we can refer the system calls table. unsigned long *syscall_table = (unsigned long *) 0xffffffff81600300; Once the variable is set, we can use it as base and locate all the system calls based on requirement. For file i/o, we would like to locate the read(), write(), open() and close() system calls. And we will save their addresses. The following code does that. original_write= (void *)syscall_table[__NR_write]; original_read=(void *)syscall_table[__NR_read]; original_close=(void *)syscall_table[__NR_close]; 18 original_open=(void *)syscall_table[__NR_open]; Here __NR_write, __NR_read, __NR_close and __NR_open are the system call numbers corresponding to write, read, close and open system calls. And the same way we can also locate the network system calls like getsockname(), getpeername(), connect(), accept(), sendto(), recvfrm() etc. original_getsockname=(void *)syscall_table[__NR_getsockname]; original_getpeername=(void *)syscall_table[__NR_getpeername]; original_connect=(void *)syscall_table[__NR_connect]; original_accept=(void *)syscall_table[__NR_accept]; original_recvfrom=(void *)syscall_table[__NR_recvfrom]; original_sendto=(void *)syscall_table[__NR_sendto]; All the variables names original_** are required to keep the original system call addresses. So in case if we would like to call any of the original system call from our new system call, we can use these address and can invoke the actual system calls. After saving the actual addresses for system call, we have to overwrite them with the address of new system call. We will only overwrite those addresses for which we are replacing the current system call with new system call. Following code overwrites the addresses of few system calls. syscall_table[__NR_open]=new_open; syscall_table[__NR_write]=new_write; syscall_table[__NR_read]=new_read; 19 syscall_table[__NR_sendto]=new_sendto; syscall_table[__NR_recvfrom]=new_recvfrom; syscall_table[__NR_connect]=new_connect; syscall_table[__NR_accept]=new_accept; Here all the tokens named new_** are the addresses of the new system call that will get executed on behalf of the original system call. 3.5 Writing a new system call Following is the actual code inserted as new system call. First we will see the example of open system call. asmlinkage int new_open(const char __user *filename, int flags, int mode) { char fileinfo_buff[200], path[120]; int ret; if(first_open==0) { read_config_file (); first_open=1; } if(strstr(filename ,filter_list)) { 20 print_time(USER_TIME); // Get Current Time strcpy(fileinfo_buff,USER_TIME+1); // Store Time in Log Array ret=get_username(USER_NAME); if(ret < 0) { printk(KERN_ALERT "\n error in get_username"); } else { strcat(fileinfo_buff,USER_NAME); } if(flags & (O_WRONLY|O_APPEND)) { strcat(fileinfo_buff,"#WR#"); } else { strcat(fileinfo_buff,"#RD#"); } strcat(fileinfo_buff,filename); strcat(fileinfo_buff,"\n"); strcpy(path,"/home/bk/output/fileio/"); strcat(path,log_filename); if((USER_NAME[0]>='A' && USER_NAME[0]<='Z')||(USER_NAME[0]>='a' && USER_NAME[0]<='z')) { write_file(fileinfo_buff,path); } } return (*original_open)(filename, flags, mode); } 21 The new system call should receive exact same number of parameters as the original system call and also in same sequence. So we can say that the signature of both the function should be the same. As we can see, the new system call first checks the value of a variable names first_open . Initially this value is 0 and it will be only satisfied once. So for all the consequent calls this condition won’t satisfy. Within the ‘if’ condition , it calls another function read_config_file(). The code for that function is given below. void read_config_file() { mm_segment_t old_fs; int fd,i; char buf[10]; char filename[50]="/home/bk/config.txt"; old_fs=get_fs(); set_fs(KERNEL_DS); i=0; fd = original_open(filename, O_RDONLY, 0); if (fd >= 0) { while (original_read(fd, buf, 1) == 1) { printk("%c", buf[0]); filter_list[i]=buf[0]; i++; } 22 filter_list[i]='\0'; printk("\n"); original_close(fd); } set_fs(old_fs); } It will simply read the configuration file from ‘/home/bk/config.txt’ location and will put the data into a buffer named filter_list. The configuration file will have the absolute path to directory which needs to be monitored for file i/o. Sample is given below. /home/bk/Desktop/secret After calling read_config_file() , the new open() system call will check for another condition. This condition is given by statement strstr(filename, filter_list). This statement checks whether the filter_list , which is the content of the configuration file, is a substring of filename provided as parameter to open system. So all the other filename which doesn’t satisfy the condition will not get logged and the original open system call will be called. For those who satisfy the condition indicates that they are the eligible candidate and system admin wants to log the file i/o for all those files. This is a kind of internal filter. Every time admin wants to add more directories for monitoring, modification of configuration file is required. And the module also need to be reloaded, that way it can read the new configuration file and use new filter_list for internal filter. 23 After satisfying the condition, new system call will invoke function named print_time(). This function invokes the kernel mode function do_gettimeofday() which returns the current time. This time is given in total seconds, so the further calculation is required to convert it into H:M:S format, where H is for hour, M is for minutes and S is for seconds. The new system call will put the time stamp into buffer. Then it will call another function get_username(). This function actually reads the process id using currentī pid value. After retrieving the process id (pid), the function reads the file ‘/proc/pid_value/environ’, where pid_value is the process id. This file includes all the environment variables used by the process. One of the environment variable is ‘USERNAME’, which indicates the current user of the process. Username is the important information required to be logged by new system call. After retrieving the username, the system call puts into the buffer where time stamp is stored previously. Then the system call will analyze the flag parameter. If the value of flag parameter is O_WRONLY or O_APPEND , then this system call is for write operation and it will be stored as ‘WR’ operation in the buffer. If the value of flag is other then above mentioned, then the operation is read and that will be stored as ‘RD’. And then it will also read another parameter called ‘filename’. The filename is the actual absolute path of the file which is going to be read or written. Then entire buffer now carries all the required information to be stored in a log file. To write all the information into log file the system call will invoke another function. 24 write_file(fileinfo_buff,path); Here the fileinfo_buffer is the buffer which holds all the required information and path is the absolute path where the log file should be generated. This function will invoke the original write system call to write the buffer into the log file. And after writing , it will also close the file using original close() system call. Following code will call the original write call and also will pass the required parameters. fd = original_open(path, O_WRONLY|O_CREAT|O_APPEND,0777); if (fd >= 0) { original_write(fd,buffer,strlen(buffer)); original_close(fd); } else { printk(KERN_ALERT "\n Errro in write_file() while opening a file"); } This way we can replace the open() system call with new_open() system call, so we can redirect every request to open() to the new_open(). And inside the new system call, we can actually extract the required information from the parameters and log it into a file by calling the original system call. 25 Using the same method, we can also replace any of the network system call too. Following is the code to hack the network system call connect(). Here the actual system call is replaced with new_connect() system call. asmlinkage long new_connect(int fd, struct sockaddr __user *buff1, int flag) { int ret, ret1, ret2,fc; struct sockaddr_in getsock, getpeer; struct sockaddr_in *getsock_p, *getpeer_p; int socklen; char netinfo_buff[200], path[120]; char buff[100]; socklen=sizeof(getsock); mm_segment_t old_fs=get_fs(); set_fs(KERNEL_DS); ret1=original_getsockname(fd,(struct sockaddr *)&getsock,&socklen); getsock_p=&getsock; ret2=original_getpeername(fd,(struct sockaddr *)&getpeer,&socklen); getpeer_p=&getpeer; set_fs(old_fs); if(getsock.sin_family==AF_INET) 26 { char *s1=inet_ntoa(getsock.sin_addr); char *s2=inet_ntoa(getpeer.sin_addr); if((strcmp(s1,s2)) && strcmp(s1,"0.0.0.0") && strcmp(s2,"0.0.0.0") && !(strstr(s1,"192.168")) && !(strstr(s2,"192.168"))) { print_time(USER_TIME); strcpy(netinfo_buff,USER_TIME+1); ret=get_username(USER_NAME); if(ret < 0) { printk(KERN_ALERT "\n error in get_username");} else { strcat(netinfo_buff,USER_NAME);} snprintf(buff,9,"#%s","Connect"); strcat(netinfo_buff,buff); snprintf(buff,18, "#%s",inet_ntoa(getsock.sin_addr)); strcat(netinfo_buff,buff); snprintf(buff,10,"#%u",my_ntoh(getsock.sin_port)); strcat(netinfo_buff,buff); snprintf(buff,18,"#%s",inet_ntoa(getpeer.sin_addr)); strcat(netinfo_buff,buff); 27 snprintf(buff,10,"#%u\n",my_ntoh(getpeer.sin_port)); strcat(netinfo_buff,buff); strcpy(path,"/home/bk/output/network/"); strcat(path,log_filename); write_file(netinfo_buff,path); } } return original_connect(fd,buff1,flag); } Here the new_connect() system call will receive all the parameters same as actual connect() system call. This new system call uses two important original system calles to gather the information: getsockname() and getpeernaem(). System call getsockname() retrieves the locally-bound name of the specified socket and getpeername()retrieves the peer address of the specified socket. Both stores the addresses in the sockaddr structure pointed to by the address argument. And then this system call looks for AF_INET type sockets, which indicates the Internet protocol (IP). Then it checks for another condition which includes none of the ip address should be 0.0.0.0 and it should not start with 192.168 and both source ip and destination ip should not be same. That way it can filter few entries which are not required to be logged using the internal filter. 28 After getting all the information in to the buffer, it will do the exact same thing as above mentioned new_open() is doing. It will use the actual write system call to write all the retrieved information into the file. The code for all the other network i/o system calls like new_accept(), new_sendto(), new_recvfrm() etc is also same. 3.6 Cleanup part of the module When the module is unloaded , kernel will execute the cleanup part of the module. Following code represents the cleanup part. syscall_table[__NR_open]=original_open; syscall_table[__NR_write]=original_write; syscall_table[__NR_read]=original_read; syscall_table[__NR_sendto]=original_sendto; syscall_table[__NR_recvfrom]=original_recvfrom; syscall_table[__NR_connect]=original_connect; syscall_table[__NR_accept]=original_accept; It will simply replace the address of new system calls with the addresses of original system calls inside the system call table. So the entire module can be removed without affecting the actual kernel. After unloading the module, kernel can continue executing using the original system calls itself. 29 3.7 Output of phase-I By loading the LKM, all the request to different system call is redirected to our new system calls. And new system calls are extracting the required information and writing it to the log file. The name of the log file is based on current date of the system. We can create the log file at any location on the system, by defining proper path in the new system calls. The output of both file i/o log and network i/o log is given below. Sample of log file for file /io : 22:23:35#bk#RD#/home/bk/Desktop/secret/hello.txt 22:23:38#bk#WR#/home/bk/Desktop/secret/hello.txt 23:55:03#bk#WR#/home/bk/Desktop/shared_docs/temp.txt 23:56:03#bk#WR#/home/bk/Desktop/shared_docs/temp.txt Sample of log file for network i/o : 23:39:55#bk#RECEIVE#firefox-bin#192.168.188.138#34475#174.76.227.118#80 23:39:56#bk#Connect#ssh#192.168.188.138#0# 74.125.224.116 #39519 23:39:57#bk#SEND#ssh#192.168.188.138#43545#74.125.224.116 #53 23:39:58#bk#RECEIVE#ssh#192.168.188.138#43545#74.125.224.116 #53 30 Chapter 4 A WEB-BASED APPLICATION FOR MONITORING FILE I/O AND NETWORK I/O 4.1 Why filtering is required? The log files generated in System Phase will have lot of entries because all the request got logged in the file. If user will try to read or write some file, instead of having single entry for the request into the log file, it will be having multiple entries of different files. This is because, to service a single file system call the OS required to access multiple system files and those accesses are also got logged inside the file. This can increase the size of the log file dramatically. And log files can have so many additional records including the user file access requests. We don’t require all these data and it also occupies lot of disk space. The only data we need is the user process data and we need to filter those entries from the log file. After filtering, we can delete the actual log file because we don’t require to maintain those entries. For network log files, we don’t require to delete any entry. Because we are logging the entire network i/o and we are not aware about which IP addresses are required and which are not required in advance. So we can not apply filtering to the network log files. This can be considered as limitation, but we can always work around to filter the network data based on some constraints. System admin can impose different constraints and restrict the data to the limited entries only. 31 4.2 Automation schema To implement the filtering capability, we need to write some scripts that can actually read the log file and based on the required data it can only filter the data entries that satisfies the constraints. We can write the automation scripts into any of the scripting language. Here all the scripts are written into Perl. Perl is light weight scripting language and has really powerful functionality, which is easy and pretty straight forward to understand. The over all automation schema is given in the following figure. Figure 2 Automation schema The entire automation schema is divided multiple layers. We have different scripts running at different layers. As we can see initially two input files are required. One of the input file is the log file generated by the new system call in the system phase. Another file is the filter information file, which will be created by the system admin. This file will have the absolute file path for only those files for which we want the new system call to 32 record the file i/o. So the automation script will filter only those entries from the log file which are mentioned in the filter information file. Once we have the filtered log file called final_log , then we need those entries inside the final_log to go into the database. In the mysql database we have separate tables to store file i/o and network i/o based information. One Perl script will keep running into back ground which will dump the final_log entries, into the corresponding database table. After getting those values into the database, system admin can view the data using the web interface. The web interface is developed using PHP. So admin can fire different query using different constraints and can view the data using browser. 4.3 Filtering Script This script will read two files, one is the log file and another is the filter information file. And it will search for the entries given in the filter information file into the log file. So first we have to locate the log file based on the current date. Each log files are named based on current date, so first we will get the current date and then we search for the file that has name as current date. Following code is written in Perl and it will give us the current date. @timeData = localtime(time); $year=$timeData[5]+1900; $month=$timeData[4]; if($month < 10){ $month ="0".$month;} 33 $day=$timeData[3]; if($day < 10){$day = "0".$day;} $today_date =$month."_".$day."_".$year; print "\nfinal date is :".$today_date; Once we get the current date , we can use it to read the file named as current date. And we also have to read the filter information file which is created by the system admin. Following code will read the file line by line and it will also store the content of the file into an array. Each elements in an array will indicate single line in the file. $log="log.txt"; open(LOGINFO,$log); @filter_this=<LOGINFO>; close(LOGINFO); Here the array @filter_this carries the single line from filter information file as single element. We can print the content of entire array using following line. print "@ filter_this "; The same code we can use to read both the files: the log file and the filter information file. Once we have content stored in two arrays, we can scan the first array and look for entries that matched to the entries of second array. To match the string in the file content, Perl’s regular expression utility has been used. Perl has really good support in regular expression and we can easily search for a string within another string. Following code 34 will iterate through the both the arrays and look for match of elements in first array to the elements in second array. for($i=0; $i<=$#filter_this;$i++) { $string=$filter_this[$i]."*"; for($j=0; $j<=$#lines; $j++) { if($lines[$j]=~ /$string/) { print WFILE $file_date."#".$lines[$j]; } } } Here the line ($lines[$j]=~ /$string/) checks whether the string indicated by $lines[$j] is having match for the string $string. If the match is found then it will write that line into another file, which is final log file. If the match is not found then it will search through the next array elements. After the entire iteration, the new file named final_log will have all the matched entries. We don’t need the actual log file any more and it will also consume lot of disk space. So we have to remove the log file. Following code does that. $result= unlink($file); 35 Here $file is the name of the log file which required to be deleted. The new log file will be immediately created and new system call. So later we can also scan through the new log file and can again follow the same procedure. The entire script should be periodically executed and keep filtering the data. For that we can add the code that can put the script into sleep for some time and can again invoke it after coming out of the sleep. That way the filtering script will be executing periodically. The following code will put the script into continuous loop and eventually the script will sleep for certain time. After coming out of the sleep, it will again follow the same procedure. While(1) { #Script statements as above; Sleep 120; } Here all the loop statements got executed once and loop will go into sleep for 120 seconds. And after coming out it will again execute the entire cycle. So script will effectively executes ever 2 minutes. The entire script will be producing the final filtered log file and that file is names ad report_current_date.txt . 4.4 Insert data script Once the final log file is created, we need to dump the entries in final log files into the database. Database will hold the information of the log file into tables, so it can be easy 36 to manage and query those data values. So there is also one more script required that periodically dumps the data from final log files to the database. In the MySQL database, there is a facility to dump the data from a text file to the corresponding table. The text content should be of exacta format and with valid field separator and line separator. We can do it using single query supported by the mysql. The insert data script will first create the connection to database and then will fire the query to dump the file data into table. Once all the data has been inserted, it will close the connection. This script will also got invoked periodically using the same way as above. The code for the script is given below. $dbh = DBI->connect('dbi:mysql:10_21_syscall','root','root') or die "Connection Error: $DBI::errstr\n"; $sql = "LOAD DATA LOCAL INFILE '/home/bk/output/file/".$outputfile_name."' INTO TABLE fileio COLUMNS Terminated by '#' Lines terminated by '\n';"; $sth = $dbh->prepare($sql); $sth->execute or die "SQL Error: $DBI::errstr\n"; $result= unlink($outputfile_name); All the entries in the file are inserted as the rows in the ‘fileio’ table. And when it is done, it will also delete the final report. This similar file will be again generated by the filter data script and next time when the insert data script got executed, it will keep dumping the new data into the database. The reason behind deleting the final log file is, we don’t 37 need duplicate entries in the database and once it got deleted, the newly created file will only have new entries into the file. And our database is in consistent state. 4.5 Database Design There will be two tables for storing the log file information. First is for file i/o and another is for network i/o. The information stored in both the tables are given below. Table 5 Information captured by hacking file i/o and network i/o File i/o Network i/o Date date time time user user operation (read/write) operation (connect,accept,sendto, recvfrm) filepath filename source ip source port number destination ip destination port number 4.6 Web interface Web interface has been developed using the PHP. The first page is the form, where user is required to enter the range of date, range of time and type of operation as mandatory 38 fields. User can also add the username and absolute path of the file to even further filter the data. Once all the details are provided, user can hit the submit button. PHP will use these data to generate the query string and will fire query to database. The next page will get the result data and it will also render the result into HTML table, so user can see the data as report. Figure 3 Screenshot of Form to query the database Figure 4 Screenshot of result of query 39 Chapter 5 SUMMARY 5.1 Summary Such an LKM can be really useful to monitor the user activities both on local machine as well as the remote machine. On some central repository we can monitor each and every file access request by logging the required information. System admin can check the log file and can find which files are touched by the user using the local access or remote access to the system. One can also take the advantage of logging the network i/o, so that way one can monitor the access to some specific server by providing the server ip address in configuration file for LKM. LKM can cause the system to slow down because every system call request will go through the new system call and to log the information we need an additional disk i/o. This way the response time of the system can increase. This is a kind of trade-off while considering the LKM for security purpose and increasing the response time of the system. To solve this, we can keep the LKM as light weight as possible and can use such LKM where response time is not the primary concern. 40 Chapter 6 FUTURE WORK 6.1 Future work Using the LKM we can also hack other system calls as well. But it depends on what information we need to collect and how we can use that information for different purpose. In the future, if the design of the available system call changes, then the same LKM also need to be ported on latest kernel source, with all the changes which can make the LKM work on latest OS. LKM is really good and very powerful utility offered by an operating system. One can always use the it to implement some other utility that deals with system call. One can also implement the central system that collects the log file data from all over the network and can show the file i/o and network i/o snapshots of different machines on shared network. 41 REFERENCES 1. The Linux Kernel Module Programming Guide by Peter Jay Salzman, Michael Burian, Ori Pomerantz . May 2007 2. Understanding the Linux Kernel by Daniel P. Bovet, Marco Cesati. Third edition. November 2005 3. Linux Kernel Development by Robert Love. Second Edition. Novell Press. 2006. 4. Linux Loadable Kernel Module HOWTO by Bryan Henderson. 2006 (online) http://tldp.org/HOWTO/Module-HOWTO/ 5. LINUX SECURITY MODULE by Tushar Nileshkumar Dave. 2007 6. Loadable Kernel Module Programming and System Call Interception by Nitesh Dhanjani and Gustavo Rodriguez. 2001 (online) http://www.linuxjournal.com/article/4378 7. Linux cross Reference (Full source code of Kernel) (online) http://lxr.free-electrons.com/ 8. Driving Me Nuts - Things You Never Should Do in the Kernel By Greg Kroah-Hartman .Linux Journal , 2006 (online) http://www.linuxjournal.com/article/8110?page=0,1