USER ACTIVITIES MONITORING SYSTEM USING LKM

advertisement
USER ACTIVITIES MONITORING SYSTEM USING LKM
Bhaumik Patel
B.E., Sardar Patel University, INDIA, 2008
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
COMPUTER SCIENCE
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
FALL
2011
USER ACTIVITIES MONITORING SYSTEM USING LKM
A Project
By
Bhaumik Patel
Approved by:
__________________________________, Committee Chair
Jinsong Ouyang, Ph.D.
__________________________________, Second Reader
Chung-E Wang, Ph.D.
__________________________________
Date
ii
Student: Bhaumik Patel
I certify that this student has met the requirements for format contained in the University format
manual, and that this project is suitable for shelving in the Library and credit is to be awarded for
the Project.
__________________________, Graduate Coordinator
Nikrouz Faroughi, Ph.D.
Department of Computer Science
iii
________________
Date
Abstract
of
USER ACTIVITIES MONITORING SYSTEM USING LKM
by
Bhaumik Patel
Security is one of the major challenge while single machine is shared among multiple
users. Linux is the operating system which supports multiple users. All the users can have
access to different files in the system and they can access the file using local machine or
the network connection. Any inappropriate action by some user can cause system failure
or some unexpected troubles. All the activities by different users must be monitored in
order to identify the exact reason for system failure and the user who is responsible for
that.
In the operating system, system call is the only window for a user process to get into the
kernel and access different resources provided by the kernel. Linux Security Module
(LSM) is a Loadable Kernel Module(LKM) which intercept the file i/o system calls and
network i/o system calls in order to log the valuable information. It adds a layer between
the user process and actual system call by replacing the actual system call with spy
system call. The LSM supports 32 bit machine and older version of the kernel. As 64 bit
machines are common today, LSM is required to be ported on a 64 bit machine with the
latest version of kernel source. To port the actual LSM on latest hardware and latest
iv
kernel, changes are required and LSM need to be upgraded based on current system call
structure.
User activities monitoring system using LKM includes upgraded LSM as system layer
utility. This is required to hack the file i/o and network i/o and to generate the log files
based on gathered information. And as application layer utility it also includes
automation system , which required to filter the data from log file and insert that data into
the database. It also has a GUI based web interface to query the data in the database and
to generate the report for system administrator.
The entire system will be really helpful to monitor user activities both on local machine
as well as on the network. Using this tool, the administrator of the system can trace the
file i/o and network i/o, so in case the system goes down, the admin can investigate about
the activities done by different users. And can explore the actual reason for crash.
, Committee Chair
Jinsong Ouyang, Ph.D.
_________________________
Date
v
DEDICATION
Dedicated to my loving parents who inspired me to work hard and my brother for his constant
support.
vi
ACKNOWLEDGMENTS
I would like to thank everyone who encourage and motivate me throughout Master’s
Project. I am grateful to Dr. Jinsong Ouyang for his constant guidance and useful advices
all the time. He helped me a lot in identifying different problems and to build the solution
step by step. He also provided me reading materials like books, published papers etc. to
explore the Linux kernel even more. At the same time, I would also like to thank Prof.
Chung-E Wang for being a driving force for my interest in different algorithms and
providing me continuous support.
Furthermore, I would also like to thank the entire Linux Kernel Developers community.
This community has helped me a lot by providing me guidance on kernel module
development. Linux developers around the world are really active at providing answers in
discussion forum and it always helped me to solve different problems.
In the end, I would like to thank my parents and all my friends for giving me full support
and motivation in completion of the project.
vii
TABLE OF CONTENTS
Page
Dedication .................................................................................................................... vi
Acknowledgments....................................................................................................... vii
List of Tables ............................................................................................................... ix
List of Figures ...............................................................................................................x
Chapter
1. INTRODUCTION ....................................................................................................1
1.1 Overview ..........................................................................................................1
1.2 Objectives ........................................................................................................2
2. LOADABLE KERNEL MODULE (LKM) ..............................................................4
2.1 What is LKM? .................................................................................................4
2.2 Basic Structure of LKM ...................................................................................5
2.3 System Call ......................................................................................................7
2.4 LSM (Linux Security Module) ........................................................................8
2.5 Porting .............................................................................................................9
3. KERNEL INSTRUMENTATION FOR MONITORING FILE I/O AND
NETWORK I/O…... ................................................................................................11
3.1 System part overview ....................................................................................11
3.2 Basic pseudo code for LKM ..........................................................................11
3.3 Locating System call Table ..........................................................................12
viii
3.4 Overwriting the addresses of system call ......................................................17
3.5 Writing a new system call ..............................................................................19
3.6 Cleanup part of the module ............................................................................28
3.7 Output of phase-I ...........................................................................................29
4. A WEB-BASED APPLICATION FOR MONITORING FILE I/O AND
NETWORK I/O ......................................................................................................30
4.1 Why filtering is required? ..............................................................................30
4.2 Automation Schema .......................................................................................31
4.3 Filtering script ................................................................................................32
4.4 Insert data script .............................................................................................35
4.5 Database design .............................................................................................37
4.6 Web interface ................................................................................................37
5. SUMMARY ............................................................................................................39
5.1 Summary ........................................................................................................39
6. FUTURE WORK ....................................................................................................40
6.1 Future work ....................................................................................................40
References ................................................................................................................... 41
ix
LIST OF TABLES
Page
Table 1 System calls related to file i/o and network i/o ................................................2
Table 2 LKM commands and their functionality ..........................................................6
Table 3 Comparison of system call structure between 32 and 64 bit machine ...........10
Table 4 System calls and corresponding system call numbers ...................................13
Table 5 Information captured by hacking file i/o and network i/o .............................37
x
LIST OF FIGURES
Page
Figure 1 Overall structure of system and position of system call ................................. 7
Figure 2 Automation schema ...................................................................................... 31
Figure 3 Screenshot of Form to query the database .................................................... 38
Figure 4 Screenshot of result of query ........................................................................ 38
xi
1
Chapter 1
INTRODUCTION
1.1 Overview
Linux is a multi-user operating system. Many different users can share a single Linux
machine and can access the machine locally or through the network. In the Linux, all the
file system requests and network requests can be satisfied by system calls. System calls
are the only window for user processes to enter into the kernel and use the shared
resources in a proper manner. Each request will invoke the corresponding system call and
the kernel provides specific service for that system call.
To provide the operating system level security, we can alter the code of system call and
customize it to provide more support for our own code. So every time a request for a file
i/o or network i/o occurs, the execution flow will pass through modified system call.In
our customized system call, we can gather the required information for security purpose
in between. For file i/o, the information like, who is asking for file access, what is the
purpose of request (reading or writing), time stamp for the request, the absolute path of
file which is requested etc. can be very useful to log for file system security. Foe network
i/o, the information like who is invoking the network request, time stamp, filename of
the involved file, the IP address of source and destination, the port number of source and
destination etc. can be useful to log.
2
To hack the system call, LKM (Loadable Kernel module) can be really helpful. Using
it, we can add our own system calls to the kernel as well as we also can modify the
existing system call. Linux security module (LSM) is itself an LKM, which hacks the
different file i/o system calls and network i/o system calls. LSM is originally developed
by Tushar Dave in year 2007. LSM was designed for 32 bit machine and it supports the
older version of Linux kernel. Nowadays machines are 64 bit and Linux kernel source is
also very advanced. LSM need to be ported on latest Linux kernel source and 64 bit
machine.
1.2 Objectives
Writing a Loadable Kernel Module (LKM) for a 64 bit machine to hack the file i/o and
network i/o system calls. This will make each request to actual system call will go
through the replaced system call. Within the replaced system call we can extract required
information and write it into a log file. Two separate log files will be created, one for file
i/o and another for network i/o. Corresponding system calls are as follows.
Table 1 System calls related to file i/o and network i/o
File System Calls
Network System Calls
Open
Connect
Read
Accept
Write
Sendto
Recvfrm
3
Once all the above mentioned system calls are hacked and LKM starts generating the log
files, we should have enough data for user activities inside the log file that we wanted to
gather. Now we can use some automation to filter the log file data. As log file will have
all the information related to all the request to those system calls, we might not be
interested in looking entire log. And also the size of log files can be very large. So using
filtering capability we can filter the only information that we are interested in. And after
extracting all those information, we can remove the log file from the disk.
A mechanism is also required to query the filtered log file. For that, another part of
automation script will keep dumping the filtered log file data into the database
periodically. That way we can have data maintained very well within the database. And
on top of that, some web interface is also required with GUI to query the data and to
generate the reports for system administrator.
4
Chapter 2
LOADABLE KERNEL MODULE (LKM)
2.1 What is LKM?
Loadable kernel module is a way of expanding the kernel source code. There are two
ways of changing the code in Linux kernel.
[1] Actually changing the kernel code and rebuilding the kernel
[2] Loadable kernel module
The advantage of LKM over the first method is, the kernel doesn’t require to be
recompiled and rebuild. LKM can be dynamically loaded and unloaded from the kernel
without actually rebuilding the kernel code. It can be used to achieve many of the
advantages of microkernel without additional performance penalties. Almost every
different component of Linux like device drivers, system call related modules, executable
formats and so on are eligible to be written as kernel module. LKM are also supported in
most of other commercial operating systems like Microsoft Windows, FreeBSD, Mac-OS
etc.
LKMs have a lot of advantages over changing the actual base kernel. LKM can also help
us to diagnose system problem. LKM can save the memory because it doesn’t have to be
inside the memory all the time, it has to be within the memory only if some other process
is using the LKM inside the OS. After loading a module into memory kernel will
maintain a usage count for that module. The count indicates how many other processes
5
are currently using the module. A module can be unlinked if the usage count is 0.
Multiple modules can also have dependencies among each other. So if module B is
dependent on module A, then module A should be loaded prior to module B.
2.2 Basic Structure of LKM
Pseudo code for LKM is given below.
#include Header Files
int init_module()
{
Code to perform the operation inside the kernel;
}
void cleanup_module()
{
Undo everything done in init_module();
}
Every kernel module has two parts: entry and exit. Here in the pseudo code, function
init_module() is the entry part. When the module is loaded inside the kernel, this function
will be invoked. Inside this function we can add the code that performs various
operations inside the kernel. It may add new capabilities or it can modify the existing
capabilities. In the same code, we also have exit part which is cleaup_module() function.
Whenever the module is unloaded from the kernel, this function will be invoked. Usually
this part will have the code to do cleanup, means whatever we have done in init part is
6
been undone in the cleanup part. As the LKM is going to use system resources, it should
be written very carefully and both parts, init and cleanup should be exact reverse. So if in
init part is it using some resources then in the cleanup part is should release all the
acquired resources.
Below Few commands are given related to LKM that can be invoked using Linux
terminal.
Table 2 LKM commands and their functionality
Command
Functionality
Insmod
To load the module into kernel
Rmmod
To remove(unload) the module from kernel
Modeprobe
Automatically detect the dependencies and insert/remove
module accordingly
Depmod
Determine interdependencies between multiple LKMs
Lsmod
To list all the currently loaded modules
Modinfo
Print out the information about one or more module
7
2.3 System Call
user processes
Interrupts System calls Exception
user mode
kernel Mode
Kernel
Hardware
Figure 1 overall structure of system and position of system call
As we can see from the figure 2.1, the system call is the only passage for user process to
get inside the kernel. User processes don’t have direct access to the kernel resources. But
using the system call they can ask the kernel to execute on behalf of their own to get
access to underlying resources. In the Linux, the entire file system is also considered as a
valuable resource and no user process is allowed to access the file system directly. The
integrity and consistency of the file system is an important aspect and each request to file
is monitored by the kernel. There are mainly three system calls which deal with file
system. They are open(), read() and write(). So every user process need to invoke open()
system call in order to request for some specific file. And only after successfully opening
8
the file, user process can invoke the read() to read from the file and the write() system
call to write into that file. By hacking the open() system call, we can add our own system
call to handle the open requests. So that, every time a user process wants to read
something from a file, it will invoke open() system call but instead of open(), our own
system call will be executed. In our own system call we can take the advantage of user
information and can extract the additional data from it. We can store the data into log file
and then can continue by invoking the actual system call.
The Linux kernel also has many system calls for network requests. Some of them are
connect(), accept(), sendto(), recvfrm() etc. Connect() is the system call invoked by the
client process running on another machine in the network. To accept the client
connection, the server process will invoke the accept() system call. After the connection
has been established successfully, both the processes can communicate with each other
using sendto() and recvfrm() system call. We can also use the same trick as above, to
hack the network call as well. Here also after extracting the information from the
requests, we can invoke the actual system calls, so that way user processes running on the
network will not notice the replacement of system call.
2.4 LSM (Linux Security Module)
Linux security module is the similar effort done in past. LSM supports the 32 bit
operating system and older version of kernel. This was developed in 2007. So after that,
9
we have many versions of kernel source. Nowadays, people have started using 64 bit
machine. So 64 bit machines are common and LSM needs to be updated to support the
latest kernel source and 64 bit machine as well.
2.5 Porting
Porting required the LKM for system call hacking to be modified. It should be updated to
make it work on latest 64 bit hardware and also with latest kernel source code. There are
several changes required for LKM to work on 64bit machine.
The header file unistd.h is required to access the system call numbers. When we include
this file, based on the underlying architecture of the machine it will include the
unistd_32.h for 32 bit machine and unistd_64.h for 64 bit machine. In our case,
unistd_64.h will be loaded. Both the files have different system call numbers
corresponding to actual system call. For example, __NR_open has value 5 in unistd_32.h
and value 2 in unistd_64.h . But that will not affect LKM because in LKM, we are going
to access the open system call number as __NR_open only. The structure of the system
calls involved in network i/o in both the files are different. In the previous
implementation of LKM, to log the network i/o, the only system call that needs to be
hacked was sys_socketcall(). And the system call number for sys_socketcall() was 102
which is indicated by __NR_socketcall. All the network related system call request are
redirected to sys_socketcall() and in it’s body, there is a big switch case which identifies
10
all the specific system call request and invokes corresponding function to satisfy the
request. In 64 bit machines, the corresponding file unistd_64.h doesn’t have such a
variable declared. So for 64 bit machine, all the network related system calls need to be
hacked separately.
Table 3 Comparison of system call structure between 32 and 64 bit machine
32 bit machine
sys_socketcall(int call, unsigned long
*args)
{
Switch(call)
64 bit machine
sys_connect(….)
{
}
{
sys_accept(….)
case SYS_CONNECT: call sys_connect()
{
case SYS_ACCEPT : call sys_accept()
}
case SYS_SENDTO : call sys_sendto()
sys_sendto(…)
….
{
}
}
As the kernel developers provide the new version every time, they may add new set of
system calls or they also can modify the current structure of the system calls. So porting
is always necessary and we can ensure that we can get same facilities even though the
kernel has been changed.
11
Chapter 3
KERNEL INSTRUMENTATION FOR MONITORING FILE I/O AND NETWORK I/O
3.1 System part overview
The system part of User activities monitoring system using LKM includes, writing an
actual LKM to hack different file i/o and network i/o system calls and generating the log
files for both. After writing the actual module, we need to load the module on Linux
machine. Once the module is loaded, it will replace the definition of the current system
call and add the customized system call at that place. All the requests will be diverted to
our own system call and after extracting the information from the request, the module
will put that request into log file. Log file names are given based on the current date. So if
the date is 10/23/2011, then the module will generate two log files for file i/o as well as
network i/o with the name “10_23_2011”. So we can search for today’s date named file
to analyze the file i/o and network i/o for today.
3.2 Basic pseudo code for LKM
#include header files
init()
{
Locate the system call table (base address) ;
Get the address of different system calls;
Save the original addresses of all the sytem call;
12
Overwrite the addresses with our new_system_calls;
}
cleanup()
{
Retrieve the original addresses from where it were stored and overwrite it back in system
call table;
(undo everything done in init)
}
new_system_call(parameters)
{
log the information into logfiles;
invoke the actual system call;
}
3.3 Locating System call Table
By following the pseudo code, first we must have to locate the system call table inside
the operating system. The system call table is the table, which maps the system call
number to the corresponding address of that system call. To invoke the system call, user
process need to generate the software interrupt and passes the system call number to the
kernel for the look up purpose. So when a specific system call is invoked , kernel will use
13
the system call number to do look up inside the system call table. Kernel will search for
the corresponding entry for that system call number and extract the address of that
particular system call. Then execution flow will jump to that address to execute the
system call. Once the system call got executed, based on the nature of particular system
call the switching will be done from kernel mode to user mode. The mapping from
system call to the corresponding number can be found in unistd.h file inside the system.
For 32 bit machine the file is named as unistd_32.h and for 64 bit machine it is names as
unistd_64.h .
The following table shows some of the system call and corresponding system call number
for 64 bit OS. These numbers are defined in unistd_64.h file in the Linux source.
Table 4 System calls and corresponding system call numbers
System call
System call Number
read()
0
write()
1
open()
2
close()
3
connect()
42
accept()
43
sendto()
44
recvfrm()
45
14
To locate the system call Table, there are three methods:
[1] using exported symbol sys_call_table
In the Linux kernel version 2.4, they have one exported symbol called “sys_call_table”
.User can use any of the exported symbols directly. So in the kernel 2.4, we can easily
read the value of symbol sys_call_table and can get the address of system call table. The
symbol sys_call_table is no more exported in Linux kernel for security reasons. But here
we are using kernel 2.6 , so we cannot use this method. For kernel 2.6, we can user other
two methods.
[2]Brute force scanning of kernel memory range
For this we need to write the small LKM . This LKM will scan the kernel memory. For
32 bit OS, the kernel memory range is from 0xc0000000 to 0xd0000000 .
So by
scanning the entire kernel memory, we will try to find the location of system call table.
#define START_MEM 0xc0000000
#define END_MEM 0xd0000000
unsigned long *syscall_table;
unsigned long **find()
{
unsigned long **sctable;
unsigned long int i=START_MEM ;
while(i<END_MEM)
{
15
sctable = (unsigned long **)i;
if(sctable[__NR_close]==(unsigned long *)sys_close)
{
return &sctable[0];
}
i+=sizeof(void *);
}
}
static int init(void)
{
printk("\nModule starting...\n");
syscall_table=(unsigned long *) find();
if(syscall_table!=NULL)
{
printk("systemcall table found at %x\n", (unsigned)syscall_table);
}
return 0;
}
static void exit(void)
{
printk("Module exiting\n");
16
return;
}
Here inside the init method , we are calling the find() function. The function is scanning
the kernel memory range and trying to find the address of sys_call_table. To do that,
here we are using sys_close symbol, which is still exported in kernel 2.6 . That symbol
represent the system call number of close() system call. So we are finding the base
address for which, the address of close() system call relative the base address is the value
of sys_close symbol. As soon as the condition is satisfied, we can know that actual base
address.
[3] Using ‘grep’ commad
We can use grep command, to search for sys_call_table key word from the file
/boot/System.map-2.6.35-22-generic’. Here 2.6.35 is the kernel version. The grep
command will give us the address of the sys_call_tabel. It will simple print a line on the
console and that line includes the address. The exact command and result of the
command for 64 bit and 32 bit is given as per the following .
64 Bit:
bk@ubuntu:~/lkm$ grep sys_call_table /boot/System.map-2.6.35-22-generic
ffffffff81600300 R sys_call_table
32 Bit:
bk@ubuntu:~/lkm$ grep sys_call_table /boot/System.map-2.6.35-22-generic
c05d2180 R sys_call_table
17
This method works fine for both 32 bit and 64 bit operating system. This is more reliable
method as previous method will not work in case if the kernel developer decides not to
export the sys_close symbol in the future.
3.4 Overwriting the addresses of system call
Once the system call table is located successfully, we have the base address of system
call table. And using this base address, we can get the addresses of all the system calls
supported by operating system inside the system call table. After getting the address of
system calls, first we need to save the original addresses so we can retrieve it later when
we are executing the cleaning part. As cleaning part will restore the original system call
table, we required to have the original addresses of system call from system call table.
The statement below assigns sys_call_table variable the value of base address of system
call table. Using that we can refer the system calls table.
unsigned long *syscall_table = (unsigned long *) 0xffffffff81600300;
Once the variable is set, we can use it as base and locate all the system calls based on
requirement. For file i/o, we would like to locate the read(), write(), open() and close()
system calls. And we will save their addresses. The following code does that.
original_write= (void *)syscall_table[__NR_write];
original_read=(void *)syscall_table[__NR_read];
original_close=(void *)syscall_table[__NR_close];
18
original_open=(void *)syscall_table[__NR_open];
Here __NR_write, __NR_read, __NR_close and __NR_open are the system call numbers
corresponding to write, read, close and open system calls. And the same way we can also
locate the network system calls like getsockname(), getpeername(), connect(), accept(),
sendto(), recvfrm() etc.
original_getsockname=(void *)syscall_table[__NR_getsockname];
original_getpeername=(void *)syscall_table[__NR_getpeername];
original_connect=(void *)syscall_table[__NR_connect];
original_accept=(void *)syscall_table[__NR_accept];
original_recvfrom=(void *)syscall_table[__NR_recvfrom];
original_sendto=(void *)syscall_table[__NR_sendto];
All the variables names original_** are required to keep the original system call
addresses. So in case if we would like to call any of the original system call from our new
system call, we can use these address and can invoke the actual system calls. After saving
the actual addresses for system call, we have to overwrite them with the address of new
system call. We will only overwrite those addresses for which we are replacing the
current system call with new system call. Following code overwrites the addresses of few
system calls.
syscall_table[__NR_open]=new_open;
syscall_table[__NR_write]=new_write;
syscall_table[__NR_read]=new_read;
19
syscall_table[__NR_sendto]=new_sendto;
syscall_table[__NR_recvfrom]=new_recvfrom;
syscall_table[__NR_connect]=new_connect;
syscall_table[__NR_accept]=new_accept;
Here all the tokens named new_** are the addresses of the new system call that will get
executed on behalf of the original system call.
3.5 Writing a new system call
Following is the actual code inserted as new system call. First we will see the example of
open system call.
asmlinkage int new_open(const char __user *filename, int flags, int mode)
{
char fileinfo_buff[200], path[120];
int ret;
if(first_open==0)
{
read_config_file ();
first_open=1;
}
if(strstr(filename ,filter_list))
{
20
print_time(USER_TIME);
// Get Current Time
strcpy(fileinfo_buff,USER_TIME+1);
// Store Time in Log Array
ret=get_username(USER_NAME);
if(ret < 0)
{
printk(KERN_ALERT "\n error in get_username"); }
else
{
strcat(fileinfo_buff,USER_NAME); }
if(flags & (O_WRONLY|O_APPEND))
{
strcat(fileinfo_buff,"#WR#");
}
else
{
strcat(fileinfo_buff,"#RD#");
}
strcat(fileinfo_buff,filename);
strcat(fileinfo_buff,"\n");
strcpy(path,"/home/bk/output/fileio/");
strcat(path,log_filename);
if((USER_NAME[0]>='A' && USER_NAME[0]<='Z')||(USER_NAME[0]>='a' &&
USER_NAME[0]<='z'))
{ write_file(fileinfo_buff,path); }
}
return (*original_open)(filename, flags, mode);
}
21
The new system call should receive exact same number of parameters as the original
system call and also in same sequence. So we can say that the signature of both the
function should be the same. As we can see, the new system call first checks the value of
a variable names first_open . Initially this value is 0 and it will be only satisfied once. So
for all the consequent calls this condition won’t satisfy. Within the ‘if’ condition , it calls
another function read_config_file(). The code for that function is given below.
void read_config_file()
{
mm_segment_t old_fs;
int fd,i;
char buf[10];
char filename[50]="/home/bk/config.txt";
old_fs=get_fs();
set_fs(KERNEL_DS);
i=0;
fd = original_open(filename, O_RDONLY, 0);
if (fd >= 0) {
while (original_read(fd, buf, 1) == 1)
{ printk("%c", buf[0]);
filter_list[i]=buf[0];
i++; }
22
filter_list[i]='\0';
printk("\n");
original_close(fd);
}
set_fs(old_fs);
}
It will simply read the configuration file from ‘/home/bk/config.txt’ location and will put
the data into a buffer named filter_list. The configuration file will have the absolute path
to directory which needs to be monitored for file i/o. Sample is given below.
/home/bk/Desktop/secret
After calling read_config_file() , the new open() system call will check for another
condition. This condition is given by statement strstr(filename, filter_list). This statement
checks whether the filter_list , which is the content of the configuration file, is a substring
of filename provided as parameter to open system. So all the other filename which
doesn’t satisfy the condition will not get logged and the original open system call will be
called. For those who satisfy the condition indicates that they are the eligible candidate
and system admin wants to log the file i/o for all those files. This is a kind of internal
filter. Every time admin wants to add more directories for monitoring, modification of
configuration file is required. And the module also need to be reloaded, that way it can
read the new configuration file and use new filter_list for internal filter.
23
After satisfying the condition, new system call will invoke function named print_time().
This function invokes the kernel mode function do_gettimeofday() which returns the
current time. This time is given in total seconds, so the further calculation is required to
convert it into H:M:S format, where H is for hour, M is for minutes and S is for seconds.
The new system call will put the time stamp into buffer. Then it will call another function
get_username(). This function actually reads the process id using currentīƒ pid value.
After retrieving the process id (pid), the function reads the file ‘/proc/pid_value/environ’,
where pid_value is the process id. This file includes all the environment variables used by
the process. One of the environment variable is ‘USERNAME’, which indicates the
current user of the process. Username is the important information required to be logged
by new system call. After retrieving the username, the system call puts into the buffer
where time stamp is stored previously.
Then the system call will analyze the flag parameter. If the value of flag parameter is
O_WRONLY or O_APPEND , then this system call is for write operation and it will be
stored as ‘WR’ operation in the buffer. If the value of flag is other then above mentioned,
then the operation is read and that will be stored as ‘RD’. And then it will also read
another parameter called ‘filename’. The filename is the actual absolute path of the file
which is going to be read or written. Then entire buffer now carries all the required
information to be stored in a log file. To write all the information into log file the system
call will invoke another function.
24
write_file(fileinfo_buff,path);
Here the fileinfo_buffer is the buffer which holds all the required information and path is
the absolute path where the log file should be generated. This function will invoke the
original write system call to write the buffer into the log file. And after writing , it will
also close the file using original close() system call. Following code will call the original
write call and also will pass the required parameters.
fd = original_open(path, O_WRONLY|O_CREAT|O_APPEND,0777);
if (fd >= 0)
{
original_write(fd,buffer,strlen(buffer));
original_close(fd);
}
else
{
printk(KERN_ALERT "\n Errro in write_file() while opening a file");
}
This way we can replace the open() system call with new_open() system call, so we can
redirect every request to open() to the new_open(). And inside the new system call, we
can actually extract the required information from the parameters and log it into a file by
calling the original system call.
25
Using the same method, we can also replace any of the network system call too.
Following is the code to hack the network system call connect(). Here the actual system
call is replaced with new_connect() system call.
asmlinkage long new_connect(int fd, struct sockaddr __user *buff1, int flag)
{
int ret, ret1, ret2,fc;
struct sockaddr_in getsock, getpeer;
struct sockaddr_in *getsock_p, *getpeer_p;
int socklen;
char netinfo_buff[200], path[120];
char buff[100];
socklen=sizeof(getsock);
mm_segment_t old_fs=get_fs();
set_fs(KERNEL_DS);
ret1=original_getsockname(fd,(struct sockaddr *)&getsock,&socklen);
getsock_p=&getsock;
ret2=original_getpeername(fd,(struct sockaddr *)&getpeer,&socklen);
getpeer_p=&getpeer;
set_fs(old_fs);
if(getsock.sin_family==AF_INET)
26
{
char *s1=inet_ntoa(getsock.sin_addr);
char *s2=inet_ntoa(getpeer.sin_addr);
if((strcmp(s1,s2)) && strcmp(s1,"0.0.0.0") && strcmp(s2,"0.0.0.0") &&
!(strstr(s1,"192.168")) && !(strstr(s2,"192.168")))
{
print_time(USER_TIME);
strcpy(netinfo_buff,USER_TIME+1);
ret=get_username(USER_NAME);
if(ret < 0)
{ printk(KERN_ALERT "\n error in get_username");}
else
{
strcat(netinfo_buff,USER_NAME);}
snprintf(buff,9,"#%s","Connect");
strcat(netinfo_buff,buff);
snprintf(buff,18, "#%s",inet_ntoa(getsock.sin_addr));
strcat(netinfo_buff,buff);
snprintf(buff,10,"#%u",my_ntoh(getsock.sin_port));
strcat(netinfo_buff,buff);
snprintf(buff,18,"#%s",inet_ntoa(getpeer.sin_addr));
strcat(netinfo_buff,buff);
27
snprintf(buff,10,"#%u\n",my_ntoh(getpeer.sin_port));
strcat(netinfo_buff,buff);
strcpy(path,"/home/bk/output/network/");
strcat(path,log_filename);
write_file(netinfo_buff,path);
}
}
return original_connect(fd,buff1,flag);
}
Here the new_connect() system call will receive all the parameters same as actual
connect() system call. This new system call uses two important original system calles to
gather the information: getsockname() and getpeernaem(). System call getsockname()
retrieves the locally-bound name of the specified socket and getpeername()retrieves the
peer address of the specified socket. Both stores the addresses in the sockaddr structure
pointed to by the address argument. And then this system call looks for AF_INET type
sockets, which indicates the Internet protocol (IP). Then it checks for another condition
which includes none of the ip address should be 0.0.0.0 and it should not start with
192.168 and both source ip and destination ip should not be same. That way it can filter
few entries which are not required to be logged using the internal filter.
28
After getting all the information in to the buffer, it will do the exact same thing as above
mentioned new_open() is doing. It will use the actual write system call to write all the
retrieved information into the file. The code for all the other network i/o system calls like
new_accept(), new_sendto(), new_recvfrm() etc is also same.
3.6 Cleanup part of the module
When the module is unloaded , kernel will execute the cleanup part of the module.
Following code represents the cleanup part.
syscall_table[__NR_open]=original_open;
syscall_table[__NR_write]=original_write;
syscall_table[__NR_read]=original_read;
syscall_table[__NR_sendto]=original_sendto;
syscall_table[__NR_recvfrom]=original_recvfrom;
syscall_table[__NR_connect]=original_connect;
syscall_table[__NR_accept]=original_accept;
It will simply replace the address of new system calls with the addresses of original
system calls inside the system call table. So the entire module can be removed without
affecting the actual kernel. After unloading the module, kernel can continue executing
using the original system calls itself.
29
3.7 Output of phase-I
By loading the LKM, all the request to different system call is redirected to our new
system calls. And new system calls are extracting the required information and writing it
to the log file. The name of the log file is based on current date of the system. We can
create the log file at any location on the system, by defining proper path in the new
system calls. The output of both file i/o log and network i/o log is given below.
Sample of log file for file /io :
22:23:35#bk#RD#/home/bk/Desktop/secret/hello.txt
22:23:38#bk#WR#/home/bk/Desktop/secret/hello.txt
23:55:03#bk#WR#/home/bk/Desktop/shared_docs/temp.txt
23:56:03#bk#WR#/home/bk/Desktop/shared_docs/temp.txt
Sample of log file for network i/o :
23:39:55#bk#RECEIVE#firefox-bin#192.168.188.138#34475#174.76.227.118#80
23:39:56#bk#Connect#ssh#192.168.188.138#0# 74.125.224.116 #39519
23:39:57#bk#SEND#ssh#192.168.188.138#43545#74.125.224.116 #53
23:39:58#bk#RECEIVE#ssh#192.168.188.138#43545#74.125.224.116 #53
30
Chapter 4
A WEB-BASED APPLICATION FOR MONITORING FILE I/O AND NETWORK I/O
4.1 Why filtering is required?
The log files generated in System Phase will have lot of entries because all the request
got logged in the file. If user will try to read or write some file, instead of having single
entry for the request into the log file, it will be having multiple entries of different files.
This is because, to service a single file system call the OS required to access multiple
system files and those accesses are also got logged inside the file. This can increase the
size of the log file dramatically. And log files can have so many additional records
including the user file access requests. We don’t require all these data and it also occupies
lot of disk space. The only data we need is the user process data and we need to filter
those entries from the log file. After filtering, we can delete the actual log file because we
don’t require to maintain those entries.
For network log files, we don’t require to delete any entry. Because we are logging the
entire network i/o and we are not aware about which IP addresses are required and which
are not required in advance. So we can not apply filtering to the network log files. This
can be considered as limitation, but we can always work around to filter the network data
based on some constraints. System admin can impose different constraints and restrict the
data to the limited entries only.
31
4.2 Automation schema
To implement the filtering capability, we need to write some scripts that can actually read
the log file and based on the required data it can only filter the data entries that satisfies
the constraints. We can write the automation scripts into any of the scripting language.
Here all the scripts are written into Perl. Perl is light weight scripting language and has
really powerful functionality, which is easy and pretty straight forward to understand.
The over all automation schema is given in the following figure.
Figure 2 Automation schema
The entire automation schema is divided multiple layers. We have different scripts
running at different layers. As we can see initially two input files are required. One of the
input file is the log file generated by the new system call in the system phase. Another
file is the filter information file, which will be created by the system admin. This file will
have the absolute file path for only those files for which we want the new system call to
32
record the file i/o. So the automation script will filter only those entries from the log file
which are mentioned in the filter information file. Once we have the filtered log file
called final_log , then we need those entries inside the final_log to go into the database.
In the mysql database we have separate tables to store file i/o and network i/o based
information. One Perl script will keep running into back ground which will dump the
final_log entries, into the corresponding database table. After getting those values into the
database, system admin can view the data using the web interface. The web interface is
developed using PHP. So admin can fire different query using different constraints and
can view the data using browser.
4.3 Filtering Script
This script will read two files, one is the log file and another is the filter information file.
And it will search for the entries given in the filter information file into the log file. So
first we have to locate the log file based on the current date. Each log files are named
based on current date, so first we will get the current date and then we search for the file
that has name as current date. Following code is written in Perl and it will give us the
current date.
@timeData = localtime(time);
$year=$timeData[5]+1900;
$month=$timeData[4];
if($month < 10){ $month ="0".$month;}
33
$day=$timeData[3];
if($day < 10){$day = "0".$day;}
$today_date =$month."_".$day."_".$year;
print "\nfinal date is :".$today_date;
Once we get the current date , we can use it to read the file named as current date. And
we also have to read the filter information file which is created by the system admin.
Following code will read the file line by line and it will also store the content of the file
into an array. Each elements in an array will indicate single line in the file.
$log="log.txt";
open(LOGINFO,$log);
@filter_this=<LOGINFO>;
close(LOGINFO);
Here the array @filter_this carries the single line from filter information file as single
element. We can print the content of entire array using following line.
print "@ filter_this ";
The same code we can use to read both the files: the log file and the filter information
file. Once we have content stored in two arrays, we can scan the first array and look for
entries that matched to the entries of second array. To match the string in the file content,
Perl’s regular expression utility has been used. Perl has really good support in regular
expression and we can easily search for a string within another string. Following code
34
will iterate through the both the arrays and look for match of elements in first array to the
elements in second array.
for($i=0; $i<=$#filter_this;$i++)
{
$string=$filter_this[$i]."*";
for($j=0; $j<=$#lines; $j++)
{
if($lines[$j]=~ /$string/)
{
print WFILE $file_date."#".$lines[$j];
}
}
}
Here the line ($lines[$j]=~ /$string/) checks whether the string indicated by $lines[$j] is
having match for the string $string. If the match is found then it will write that line into
another file, which is final log file. If the match is not found then it will search through
the next array elements. After the entire iteration, the new file named final_log will have
all the matched entries. We don’t need the actual log file any more and it will also
consume lot of disk space. So we have to remove the log file. Following code does that.
$result= unlink($file);
35
Here $file is the name of the log file which required to be deleted. The new log file will
be immediately created and new system call. So later we can also scan through the new
log file and can again follow the same procedure. The entire script should be periodically
executed and keep filtering the data. For that we can add the code that can put the script
into sleep for some time and can again invoke it after coming out of the sleep. That way
the filtering script will be executing periodically. The following code will put the script
into continuous loop and eventually the script will sleep for certain time. After coming
out of the sleep, it will again follow the same procedure.
While(1)
{
#Script statements as above;
Sleep 120;
}
Here all the loop statements got executed once and loop will go into sleep for 120
seconds. And after coming out it will again execute the entire cycle. So script will
effectively executes ever 2 minutes. The entire script will be producing the final filtered
log file and that file is names ad report_current_date.txt .
4.4 Insert data script
Once the final log file is created, we need to dump the entries in final log files into the
database. Database will hold the information of the log file into tables, so it can be easy
36
to manage and query those data values. So there is also one more script required that
periodically dumps the data from final log files to the database. In the MySQL database,
there is a facility to dump the data from a text file to the corresponding table. The text
content should be of exacta format and with valid field separator and line separator. We
can do it using single query supported by the mysql. The insert data script will first create
the connection to database and then will fire the query to dump the file data into table.
Once all the data has been inserted, it will close the connection. This script will also got
invoked periodically using the same way as above. The code for the script is given below.
$dbh = DBI->connect('dbi:mysql:10_21_syscall','root','root')
or die "Connection Error: $DBI::errstr\n";
$sql = "LOAD DATA LOCAL INFILE '/home/bk/output/file/".$outputfile_name."' INTO
TABLE fileio COLUMNS Terminated by '#' Lines terminated by '\n';";
$sth = $dbh->prepare($sql);
$sth->execute
or die "SQL Error: $DBI::errstr\n";
$result= unlink($outputfile_name);
All the entries in the file are inserted as the rows in the ‘fileio’ table. And when it is done,
it will also delete the final report. This similar file will be again generated by the filter
data script and next time when the insert data script got executed, it will keep dumping
the new data into the database. The reason behind deleting the final log file is, we don’t
37
need duplicate entries in the database and once it got deleted, the newly created file will
only have new entries into the file. And our database is in consistent state.
4.5 Database Design
There will be two tables for storing the log file information. First is for file i/o and
another is for network i/o. The information stored in both the tables are given below.
Table 5 Information captured by hacking file i/o and network i/o
File i/o
Network i/o
Date
date
time
time
user
user
operation (read/write)
operation (connect,accept,sendto, recvfrm)
filepath
filename
source ip
source port number
destination ip
destination port number
4.6 Web interface
Web interface has been developed using the PHP. The first page is the form, where user
is required to enter the range of date, range of time and type of operation as mandatory
38
fields. User can also add the username and absolute path of the file to even further filter
the data. Once all the details are provided, user can hit the submit button. PHP will use
these data to generate the query string and will fire query to database. The next page will
get the result data and it will also render the result into HTML table, so user can see the
data as report.
Figure 3 Screenshot of Form to query the database
Figure 4 Screenshot of result of query
39
Chapter 5
SUMMARY
5.1 Summary
Such an LKM can be really useful to monitor the user activities both on local machine as well as
the remote machine. On some central repository we can monitor each and every file access
request by logging the required information. System admin can check the log file and can find
which files are touched by the user using the local access or remote access to the system. One can
also take the advantage of logging the network i/o, so that way one can monitor the access to
some specific server by providing the server ip address in configuration file for LKM.
LKM can cause the system to slow down because every system call request will go through the
new system call and to log the information we need an additional disk i/o. This way the response
time of the system can increase. This is a kind of trade-off while considering the LKM for
security purpose and increasing the response time of the system. To solve this, we can keep the
LKM as light weight as possible and can use such LKM where response time is not the primary
concern.
40
Chapter 6
FUTURE WORK
6.1 Future work
Using the LKM we can also hack other system calls as well. But it depends on what information
we need to collect and how we can use that information for different purpose. In the future, if the
design of the available system call changes, then the same LKM also need to be ported on latest
kernel source, with all the changes which can make the LKM work on latest OS.
LKM is really good and very powerful utility offered by an operating system. One can always use
the it to implement some other utility that deals with system call. One can also implement the
central system that collects the log file data from all over the network and can show the file i/o
and network i/o snapshots of different machines on shared network.
41
REFERENCES
1. The Linux Kernel Module Programming Guide by Peter Jay Salzman, Michael Burian,
Ori Pomerantz . May 2007
2. Understanding the Linux Kernel by Daniel P. Bovet, Marco Cesati. Third edition.
November 2005
3. Linux Kernel Development by Robert Love. Second Edition. Novell Press. 2006.
4. Linux Loadable Kernel Module HOWTO by Bryan Henderson. 2006
(online) http://tldp.org/HOWTO/Module-HOWTO/
5. LINUX SECURITY MODULE by Tushar Nileshkumar Dave. 2007
6. Loadable Kernel Module Programming and System Call Interception by Nitesh Dhanjani
and Gustavo Rodriguez. 2001
(online) http://www.linuxjournal.com/article/4378
7. Linux cross Reference (Full source code of Kernel)
(online) http://lxr.free-electrons.com/
8. Driving Me Nuts - Things You Never Should Do in the Kernel By Greg Kroah-Hartman
.Linux Journal , 2006
(online) http://www.linuxjournal.com/article/8110?page=0,1
Download