Memory Layout
in Program Execution
By
Frédérick Giasson
fred@decatomb.com
http://www.decatomb.com
October 2001
Copyrights, 2001, Frédérick Giasson
Table of Contents:
-----------------Preface
Introduction
Introduction Chapter 1:
1:... Program execution: The route map.
1.1:.... The pseudo-shell code.
1.2:.... Create a child Process.
1.2.1:..... FORKS system call: the nine steps.
1.3:.... Execute the program.
1.3.1:..... C run-time, start-off procedure.
1.3.2:..... EXEC system call, the nine steps.
Chapter 2:
2:... Memory layout in executed program.
2.1:.... Dissection of the ELF executable file.
Chapter 3:
3:... The Stack and the Heap.
3.1:.... Where are they?
3.2:.... How to know what's the size of the user stack frame at compilation?
3.3:.... Registers.
3.3.1:..... Generals Registers.
3.3.2:..... Segments register.
3.3.3:..... Offsets Registers.
3.4:.... The stack.
3.4.2:..... Stack management during procedure calling.
3.4.2.1:...... The call.
3.4.2.2:...... The Prologue.
3.4.2.3:...... The return.
3.5:.... The Heap.
Conclusion
Annex 1
Annex 2
Annex 3
Bibliography
Preface:
-------This is my first article published on the web. This article is for beginners
and intermediates systems administrators, programmers, kernel developers, hobbyists,
or any other computer enthusiasts interesting in the subject.
I'm sorry for my bad English but, if you find grammar errors and are
willing to report them, please contact me and I'll change the text with pleasures.
If you wish to discuss of this article with me and other readers, please,
contact us on the discussion forum dedicated for this article:
- http://www.decatomb.com/forums/viewforum.php?forum=29
Introduction:
------------Memory management is a hot topic in the operating system development.
This is an important resource of the system and need to be carefully managed.
This paper won't discuss of the whole process of the memory management
system. No, we'll see the point of view of the user in this MM system. We'll
see how a program file is executed and mapped into the memory.
Yes, there are many other parts in the MM system like: swapping, virtual
memory, page replacement algorithms, segmentation, etc. So, if you wish
understand the whole process of the MM system in an operating system, and then
look at the bibliography at the bottom of this document. There are many useful
resources about the Linux/Unix and Minix Memory Management system.
I'll use the Minix and Linux Memory management system to explain you
how this process works. This is practically the same schemas with other
ELF-based A-32 Operating Systems like NetBSD, FreeBSD, etc. Every demonstration
programs will be compiled with GCC under Linux and I'll also use GDB to debug
the Assembly code of our demonstration programs to show you how it works in a
low level environment.
Chapter 1: Program execution: The route map:
--------------------------------------------It's interesting to know how an executed program is mapped in the memory
but, how is he executed? In the first part of this paper I'll show you how the
whole execution process of a program work by tracing you the route map. The starting
point is when the user type the program name in a shell and strike < Enter >. The
final step of the route is when the program is mapped in memory and ready to start.
-----[Root@Seldon prog]# helloworld < Enter >
Hello World!
-----Okay, I typed the name of my program to execute( helloworld ) and I pressed
the key < Enter >. What append between the time that I strike < Enter > and the
apparition of the "Hello World!" string in my console screen? Is this magic? Certainly
not!
1.1: The pseudo-shell code:
--------------------------There is the pseudo-code of a very basic shell program:
-----#define TRUE 1
ARRAY
ARRAY
command
parameters
while(TRUE)
{
read_command_line(command,parameters);
from the terminal.
/* We are waiting to read an input
In our example, command =
"helloworld"
pid_t = fork();
child process */
if( pid_t != 0 )
{
child's thread
and parameters = "" */
/* pid_t contain the process ID of the
/* If the PID isn't 0 then...
Note: The PID == 0 when it's the
of execution. */
/* Parent code. */
waitpid(-1,&status,0);
/* The program is waiting the end of
the child
execution ( -1 ) to continue the
parent process
}else{
( the Shell ) */
/* Child Code. */
execve(command,parameters,0);
/* Finally, we execute our helloworld
program! */
}
}
-----Note 1: The execve() function is called in the child process ( when fork() == 0 )
and the waitpid() command is called in the parent process ( when fork() == pid_t ).
-----You don't understand? Don't worry, I'll explain you every part
of this pseudo-shell bellow.
1.2: Create a child process:
---------------------------First, the program needs to create a new process to handle the execution of
our program. There is the ways to do this under Minix and Linux:
-----Minix:
-----do_fork()
Linux:
-----fork()
vfork()
clone()
------
Under Linux, vfork() and clone() have same functions as fork() but with some
difference in the process management. See at the Linux Man pages for more information
about these two functions ( "man clone" or "man vfork" ). We'll concentrate our efforts
on the fork() function. The fork() function do the same thing under Minix and Linux.
This function will create an exact duplicate of the parent process, including all the
file descriptions, variables, registers, everything. After the fork() call, the child
and the parent process go their separate ways. The values of all variables are the same
at the call of the fork() but, after him, the values of parent and child variables will
changes and the ones done on the parent process will not affect the ones on the child
process. The only thing that is shared between the parent and the child process is the
text section which is unchangeable.
Okay, the FORK system call is sent, the child process is created. Now before the
end of the fork() function, the system call will return to the program the value of the
child process identification ( the PID ). The PID is a signed integer variable defined
in types.h as:
Note 1: The fork() procedure will send the FORK system call. Dont get
muddled by these two concepts.
-----Minix:Source: minix/include/sys/types.h
-typedef int
pid_t;
Linux:Source: /posix/sys/types.h
-ifndef
__pid_t_defined
typedef int
__pid_t pid_t;
------
1.2.1: FORKS system call: the nine steps:
----------------------------------------There is the nine things, which the FORK system call do:
Note 1: You'll find every of these steps in the do_fork() source code
in Annex 1.
Note 2: The code for the Linux fork() function is in glibc in the
fork.c file.
Note 3: The descriptions of these steps are only applicable for the Minix
operating system. The base is the same for Linux. The only thing
which really differ is how the process is created by the kernel.
1- Check to see if process table is full. ( Lines 14 to 16 in Annex 1 )
Okay, What's the hell is this process table?
The process table is a part of the kernel. The declaration
of the table is in Annex 2. This table contains all
process' registers, memory map, accounting, and message to
send and receive.
Note 1: The number of slots in the process table is defined
by NR_PROCS in " /include/minix/config.h ":
-----#define NR_PROCS 32
-----Note 2: In Linux, the maximum number of process is the size
of the task vector, by default he have 512 entries.
2- Try to allocate memory for the child's data and stack. ( Lines 21 to 26 in Annex 1 )
3- Copy the parent's data and stack to the child's memory. ( Lines 28 to 34 in Annex 1 )
4- Find a free process slot and copy parent's slot to it. ( Lines 36 to 38 in Annex 1 )
5- Enter child's memory map in process table. ( Lines 41 to 57 in Annex 1 )
6- Choose a pid for the child. ( Lines 60 to 69 in Annex 1 )
Note 1: Don't forget, the pid_t must be a signed integer.
7- Tell kernel and file system about child. ( Lines 71 and 72 in Annex 1 )
Why the FORK system call tell to the kernel AND the file system
about the newly created process? Because in Minix the process
management, memory management and file management are each
handled by separate modules. So, the process table is
partitioned in 3 parts, and each of these parts have fields that
it needs. The part of the process table involve in the memory
management is defined in the file "/src/mm/mproc.h". The part
involve in the file system is defined in the file "/src/fs/fproc.h".
Finally, the one involve with the kernel is defined in the Annexe 2.
8- Report child's memory map to kernel. ( Lines 74 to Lines 77 in Annex 1 )
9- Send reply messages to parent and child. ( Lines 80 and 81 in Annex 1 )
Note 1: The return value to the child process is 0 and the return
value to the parent process is the PID of the child.
As we can see, the first part of a program execution called from a shell isn't
so simple. The FORK system call will only create a new process to handle the execution
of our new program started by the execve() command. So, the whole protocol of process
management is implicated when we call the do_fork() procedure, and, at every level of
the system (to the process management system, to the I/O tasks, to the server processes
( FS, MM and network ) and finally to the user processes). I'll not discuss of this
protocol in this paper because it's not is goal.
1.3: Execute the program:
------------------------Okay, we created a new process, now; we'll use this process to execute our
program. The execve() function call a new system call know as "EXEC system call".
What this system call does? He replace the current memory image with a new one and
setup a new stack for this new memory image.
There is the ways to execute a program under Minix and Linux:
Minix:
-----do_exec()
Note 1: In the /src/mm/exec.c library. There is other do_exec() functions in the
/src/fs/misc.c and /src/kernel/system.c library but I'll talk about them
later in this section.
Linux:
-----execve()
There is other variants of the exec family procedures, see man pages for more
information:
execl()
execlp()
execle()
execv()
execvp()
Okay, we'll take in consideration that there is a hole, of the size of our
new image, in memory. I'll first show you how the program is handled by the EXEC
system call and after, show you all steps that the do_exec() function perform.
Same as the FORK system call, take in consideration that the following explanation
is only fully compatible for the Minix operating system but the base is the same
under the Linux OS. The handling of the EXEC system call is the same, but, under
Linux, the Kernel, MM and FS can handle the problem in other ways with many other
features specific to the Linux system but the process is the same.
There is the memory schemas that we'll use to understand the whole process
of EXEC when we pass the " mv hw pg " command to our shell.
Note 1: This command will rename the file "hw" to "pr".
Arrays passed to execve()
-------------------------
Argument
Array
--------------|
0
|
|---------------|
|
pr
|
|---------------|
|
hw
|
|---------------|
|
mv
|
--------------Figure 1.0
Environment
Array
--------------|
0
|
|---------------|
| HOME=/root
|
--------------Figure 1.1
The stack build by execve()
--------------------------3
40
36
32
28
24
20
16
12
8
4
0
2
1
0
--------------| \0| t | o | o |
|---------------|
| r | / | = | E |
|---------------|
| M | O | H | \0|
|---------------|
| r | p | \0| w |
|---------------|
| h | \0| v | m |
|---------------|
|
0
|
|---------------|
|
33
|
|---------------|
|
0
|
|---------------|
|
30
|
|---------------|
|
27
|
|---------------|
|
24
|
---------------
Figure 1.2
The stack after relocation by the memory manager:
------------------------------------------------3
2
1
0
6532
--------------| \0| t | o | o |
|---------------|
6528
6524
6520
6516
6512
6508
6504
6500
6496
6492
| r | / | = | E |
|---------------|
| M | O | H | \0|
|---------------|
| r | p | \0| w |
|---------------|
| h | \0| v | m |
|---------------|
|
0
|
|---------------|
|
6525
|
|---------------|
|
0
|
|---------------|
|
6522
|
|---------------|
|
6519
|
|---------------|
|
6516
|
---------------
Figure 1.3
The stack as it appears to main() at the start of execution:
------------------------------------------------------------3
6532
6528
6520
6516
6512
6508
6504
6500
6496
6492
6488
6484
6480
6476
Figure 1.4
2
1
0
--------------| \0| t | o | o |
|---------------|
| r | / | = | E |
|---------------|
|---------------|
| r | p | \0| w |
|---------------|
| h | \0| v | m |
|---------------|
|
0
|
|---------------|
|
6525
|
|---------------|
|
0
|
|---------------|
|
6522
|
|---------------|
|
6519
|
|---------------|
|
6516
|
|---------------|
|
6508
|
|---------------|
|
6492
|
|---------------|
|
3
|
|---------------|
|
return
|
---------------
6524
<-- envp
<-- argv
<-- argc
| M | O | H | \0|
I'll now explain you how these stacks representations works and after
I'll show you all EXEC system call steps to finally have our program mapped in
memory.
There are two important arrays in the EXEC process. The environment
array (figure 1.0) and the argument array (figure 1.1). The environment array
is an array of string, which is passed as environment to the new program. The
argument array is an array of argument strings passed to the new program. These
two arrays need to be terminated with a NULL character ("\0"). The do_exec()
procedure will now build the initial stack within the shell's address space
(Figure 1.2, Annex 3 lines 049 to 056). Next, the procedure will call the MM and
this one will allocate new memory for the new created stack and release the old
one (Annex 3 lines 062 to 066). After the procedure will patch up the pointers
(Annex line 077) and now the memory from Figure 1.2 have the look of the memory
of the Figure 1.3. Finally, we'll save the offset to initial argc (Annex 3 line
112). The initial stack argument is a part of the procedures table in the memory
management system. There is a pointer on the initial stack in the MPROC structure
in the "/src/mm/mproc.h" file. The memory finally looks like the Figure 1.4. This
is the stack representation which will appears to main() procedure at the start
of execution!
1.3.1: C run-time, start-off procedure:
--------------------------------------Okay, now, the program is mapped and executed. However, we have a little
problem. For the C compiler, the main() is just another function. The compiler
doesn't know that this function is the entry point of our program to execute! So
the compiler will compile the main() function code to access the three parameters
considering the standard C calling convention, last parameter first. In this case,
there is supposed to have three parameters ( one integer and 2 pointers ) before
the return address but this is not the case in our Figure 1.3. In this case, how
can we pass the three parameters to the main() function? We'll create a small
assembly code, which will be insert in the front head of our program. The code is
called C run-time, start-off procedure, CRTSO, and his general goal is to put three
more dwords on the stack and call the main() function with standard call instruction:
-DWord 1:
ARGC: The number of parameters passed to the function.
Type: Integer
Note 1: Adress 0x6476 on Figure 1.4
-DWord 2:
ARGV: Pointer on parameters array string.
Type: Pointer
Note 1: Adress 0x6484 and pointer on 0x6492 on Figure 1.4
-DWord 3:
ENVP: Pointer on the environment array string.
Type: Pointer
Note 1: Adress 0x6488 and pointer on 0x6508 on Figure 1.4
These three dwords are represented in the Figure 1.4. Okay, there is an assembly
procedure called CRTSO, but what look like this procedure? Let the hunt begin! The
GDB hunting ground is now open!
Let us first import our test program ("crtso") in GDB.
----(gdb) file crtso
Reading symbols from crtso...done.
----Okay, now, we don't have any ideas of where to start to find this legend in the
ground. In this case, let us start a point 0, this is the only point that we know
his existence.
----(gdb) disassemble main
Dump of assembler code for function main:
0x80481e0 <main>:
push %ebp
...
0x80481eb <main+11>: call 0x8048498 <exit>
End of assembler dump.
----There is no information about the CRTSO location.
I have an idea. We'll track him in the program by following each function addresses
-1 dword, in this case, we'll have the address of the previous function. If we do
this to each functions, we'll probably find the root procedure, the CRTSO!
Let's begin the tracking with this new method.
----(gdb) disassemble main-1
Dump of assembler code for function init_dummy:
0x80481d0 <init_dummy>: push
%ebp
...
0x80481da <init_dummy+10>:
lea
0x0(%esi),%esi
End of assembler dump.
----We found the frame_dummy function at adress 0x80481d0 -1.
----(gdb) disassemble init_dummy-1
Dump of assembler code for function frame_dummy:
0x80481a0 <frame_dummy>:
push
%ebp
...
0x80481c9 <frame_dummy+41>:
lea
0x0(%esi,1),%esi
End of assembler dump.
----We found the fini_dummy fonction at 0x80481a0 -1.
----(gdb) disassemble frame_dummy-1
Dump of assembler code for function fini_dummy:
0x8048190 <fini_dummy>: push
%ebp
...
0x804819a <fini_dummy+10>:
lea
0x0(%esi),%esi
End of assembler dump.
----We found the __do_global_dtors_aux fonction at 0x8048190 -1
----(gdb) disassemble fini_dummy-1
Dump of assembler code for function __do_global_dtors_aux:
0x8048130 <__do_global_dtors_aux>:
push
%ebp
...
0x804818d <__do_global_dtors_aux+93>:
End of assembler dump.
-----
lea
0x0(%esi),%esi
We found the call_gmon_start fonction at 0x8048130 -1
----(gdb) disassemble __do_global_dtors_aux-1
Dump of assembler code for function call_gmon_start:
0x8048104 <call_gmon_start>:
push
%ebp
...
0x804812f <call_gmon_start+43>: nop
End of assembler dump.
----We found the _start fonction at 0x8048104 -1
Hummm, that's an interesting function, that's not?
We found it!
----(gdb) disassemble call_gmon_start-1
Dump of assembler code for function _start:
0x80480e0 <_start>:
xor
%ebp,%ebp
0x80480e2 <_start+2>:
pop
%esi
0x80480e3 <_start+3>:
mov
%esp,%ecx
0x80480e5 <_start+5>:
and
$0xfffffff0,%esp
0x80480e8 <_start+8>:
push
%eax
0x80480e9 <_start+9>:
push
%esp
0x80480ea <_start+10>: push
%edx
0x80480eb <_start+11>: push
$0x808e220
0x80480f0 <_start+16>: push
$0x80480b4
0x80480f5 <_start+21>: push
%ecx
0x80480f6 <_start+22>: push
%esi
0x80480f7 <_start+23>: push
$0x80481e0
0x80480fc <_start+28>: call
0x80481f0 <__libc_start_main>
0x8048101 <_start+33>: hlt
0x8048102 <_start+34>: mov
%esi,%esi
End of assembler dump.
----The CRTSO will put the three parameters on the stack by performing three
push commands:
0x80480e8 <_start+8>: push %eax
0x80480ea <_start+10>: push %edx
0x80480f5 <_start+21>: push %ecx
! push argc ( integer )
! push argv ( pointer )
! push envp ( pointer )
after, the CRTSO will execute the __libc_start_main function (visible in the
libc.so.6 library). Than, the __libc_start_main will call the __libc_init_first
function and this function will call _init. Then it arrange _fini to be called
when the program exit.
finally, 0x8048101 <_start+33>: hlt, is called to force a trap if exit fails.
After this, all parameters will be on the stack and the main() function of
our program will have access to these parameters as shown in the Figure 1.4.
1.3.2: EXEC system call, the nine steps:
---------------------------------------Now, I'll explain every steps of the EXEC system call. There are nine
important steps to have the program mapped in memory and executed.
1- First, check for memory and check if the file is executable.( Lines 24 to 37 in Annex 3 )
The file execution is in relation with the MM then, the MM
will inform the FS by the tell_fs() procedure to switch to
the user's working directory rather than to MM's. The
execution of the program will be done by the MM allowed()
function.
2- Read the header to get the segment and total size.( Lines 40 to 46 in Annex 3 )
3- Fetch the arguments and environment from the caller.( Lines 49 to 56 in Annex 3 )
4- Allocate new memory and release unneeded old memory.( Lines 62 to 66 in Annex 3 )
Before doing this, we'll check, with the find_shared() procedure ( line 59 in
Annex 3 ), if the text version is not already loaded in memory and able to
be shared with another process. After, we'll call the new_mem() function.
This function will check in the memory to find a hole big enought for our
new memory image( there, the data and stack section of our application if
we find an accesseble text section to share in the memory ). After, memory
maps are updated and the sys_newmap() function will report chages to the
Kernel.
Note 1: If the new_mem() procedure don't find one hole big enough
for one, two or three sections, the program won't be executed.
A way to increase this procedure will be to put the data,
code and text sections in three different holes and link them together
but this isn't the case of the Minix operating system.
Therefore, this is the case in Linux. First, the data section and
the code section could be in different virtual memory holes.
The fact is that under Linux there is an intermediate memory
heaven between the process and the physical memory called
the process's virtual memory (PVM). They are linked together
with the vm_area_struct structure. This structure is a part
of the mm_struct structure and this one is a part of the
task_struct ( the vector with all running processes ). But,
this procedure is also upgraded by a technique called "demand
paging" where the virtual memory of a process is brought into
physical memory only when a process attempts to use it!
Note 2: The new_mem() function will zeroing the bss segment(this
segment is a part of the data segment. The bss contains all
uninitialised global variables. I'll talk about this segment
bellow when I'll talk about the Memory Layout of the
application in memory.), the gap and the stack segment.
Note 2.1: The gap is a memory segment between the bss and the
stack segment, which will allow them to allocate more memory. I'll
also talk about this segment in the Memory Layout chapter. Another
name give to the gap is the user stack frame.
5- Copy stack to new memory. ( lines 74 to 81 in Annex 3 )
The whole stack will be recopied in a new memory region, the user's memory region.
After, the patch_ptr() function (Line 077 Annex 3) will patch all pointers to
point to the new memory allocation ( the real place in physical memory and not the
virtual 0 ). Now, we passed from the Figure 1.2 to Figure 1.3.
6- Copy data ( and possibly text ) segment to new memory image. ( Lines 84 to 89 in Annex 1 )
The copying of the text segment will depend of the return value of the find_shared()
fonction. If there is no text segment corresponding with ours already running in
the memory, the return value will be NULL. Otherwise, the find_shared() procedure will
return the pointer of the corresponding MPROC structure in the memory.
Note 1: The MPROC structure is defined in the /src/mm/mproc.h file.
Note 2: The 3 variables involve in the MPROC structure are:
ino_t
dev_t
mp_ino;
mp_dev;
/* inode number of file */
/* device number of file system
*/
time_t mp_ctime;
/* inode changed time */
After, if the sh_mp structure ( mproc ) is NULL then, load_seg() will load
the text segment in memory and after the load_seg() procedure will be
recalled and will load the data segment in memory.
7- Check for and handle setuid, setgid bits.( Lines 100 to 109 in Annex 3 )
8- Fix up process table entry. ( Lines 115 to 127 in Annex 1 )
There the EXEC call will fix all fields of MPROC with the new memory allocations
of our user process.
9- Tel kernel that process is now runnable.( Lines 115 to 127 in Annex 3 )
Finally the process will be announced by the do_exec() procedure in
"/src/kernel/system.c" which handle the sys_exec(). The SYS_EXEC
message is defined in "/include/minix/com.h". This message will
sets program counter and stack pointer after the EXEC system call.
Our application ( helloworld ) called by the shell is definitely executed
and mapped in the user memory. In next chapter, I'll show you how the program is
mapped in the memory.
Chapter 2: Memory Layout of an executed program:
-----------------------------------------------A program is composed of variables, local and global, static and dynamic, procedures
and structure. But, how are they mapped in memory? How it works?
Note 1: All information in this chapter will be applicable for the ELF executable
file format. In this case, all information bellow will be applicable for
the Linux operating system and other IA-32 ELF-based operating systems like
OpenBSD, NetBSD, etc.
There is a basic layout of a program in memory:
--------------|
|
| Arguments and |
| environment |
|
variables
|
|
|
|---------------|
|
Stack
|<--|-|(grow downward)|
|
|
|
|User
|
|
|Stack
|
|
|Frame
|
|
|
| (grow upward) |
|( Mind the Gap )
|
Heap
|<--|-|---------------|
|
BSS
|
|---------------|
|
Data
|
|---------------|
|
Code
|
--------------Figure 1.5
There is a C code, which will explain how variables are mapped in
the memory.
varinmem.c
------------------int iGlobalInit = 1;
int iGlobalUnInit;
char *szGlobalP;
/* Global Initialized:
.data */
/* Global Uninitialized: .bss */
/* Global Uninitialized: .bss */
void function(char cArgument)
{
int iLocalInit = 1;
int iLocalUnInit;
char szLocalP[12] = "Hello World!";
/* Local Initialized:
/* Local Uninitialized:
/* Local Initialized:
szGlobalP = (char*)malloc( 12 * sizeof(char));/* Dynamic Variable:
strncpy(szGlobalP,"Hello World!",12);
stack */
stack */
stack */
heap
*/
}
int main(void)
{
function(0);
}
/* Function call: new environment */
-----I'll compile this code with debugging information for GDB:
[root@Seldon prog]# gcc -o varinmem -ggdb -static varinmem.c
Note 1: Every program in this paper will be compiled with these parameters.
2.1: The dissection of the ELF executable file:
----------------------------------------------There are all section headers of our varinmem program:
[root@Seldon prog]# readelf -e varinmem
...
Table 1:
-------Section Headers:
[Nr]
[ 0]
[ 1]
[ 2]
[ 3]
[ 4]
[ 5]
[ 6]
[ 7]
[ 8]
[ 9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
Key
W
I
O
Name
Type
NULL
.init
PROGBITS
.text
PROGBITS
.fini
PROGBITS
.rodata
PROGBITS
__libc_subinit
PROGBITS
__libc_subfreeres PROGBITS
__libc_atexit
PROGBITS
.data
PROGBITS
.eh_frame
PROGBITS
.ctors
PROGBITS
.dtors
PROGBITS
.got
PROGBITS
.sbss
PROGBITS
.bss
NOBITS
.stab
PROGBITS
.stabstr
STRTAB
.comment
PROGBITS
.debug_aranges
PROGBITS
.debug_pubnames
PROGBITS
.debug_info
PROGBITS
.debug_abbrev
PROGBITS
.debug_line
PROGBITS
.note.ABI-tag
NOTE
.note
NOTE
.shstrtab
STRTAB
.symtab
SYMTAB
.strtab
STRTAB
Addr
00000000
080480b4
080480e0
0808e300
0808e320
0809bda0
0809bda8
0809bde4
0809ce00
0809e020
0809ed60
0809ed68
0809ed70
0809ed80
0809ed80
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
08048094
00000000
00000000
00000000
00000000
Off
000000
0000b4
0000e0
046300
046320
053da0
053da8
053de4
053e00
055020
055d60
055d68
055d70
055d80
055d80
055d80
124db0
17e6c1
18155f
18157f
1815d7
18176f
18182f
000094
1818ad
1829f1
182f54
1889f4
Size
000000
000018
046220
00001e
00da80
000008
00003c
000004
001220
000d40
000008
000008
000010
000000
000f44
0cf030
059911
002e9e
000020
000058
000198
0000c0
00007e
000020
001144
000103
005aa0
004b48
ES Flg Lk Inf
00
0
0
00 AX 0
0
00 AX 0
0
00 AX 0
0
00
A 0
0
00
A 0
0
00
A 0
0
00
A 0
0
00 WA 0
0
00 WA 0
0
00 WA 0
0
00 WA 0
0
04 WA 0
0
00
W 0
0
00 WA 0
0
0c
16
0
00
0
0
00
0
0
00
0
0
00
0
0
00
0
0
00
0
0
00
0
0
00
A 0
0
00
0
0
00
0
0
10
27 26d
00
0
0
Al
0
4
32
4
32
4
4
4
32
4
4
4
4
1
32
4
1
1
1
1
1
1
1
4
1
1
4
1
to Flags:
(write), A (alloc), X (execute), M (merge), S (strings)
(info), L (link order), G (group), x (unknown)
(extra OS processing required) o (OS specific), p (processor specific)
...
Table 2:
-------Description of most important sections
-------------------------------------.interp
.hash
.dynsym
.dynstr
.init
.plt
.text
.fini
.rodata
.data
.got
.dynamic
.bss
.stabstr
.comment
.note
--------
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
<-----|
Path name for a program interpreter
Symbol hash table
Dynamic Linking symbol table
Strings needed for dynamic linking
Process initialisation code
Procedure linkage table
Executable instructions
Process termination code
read-only data
Initialised data present in process image
Global offset table
Dynamic linking information
Uninitialised data present in process image
Usually names associated with symbol table entries
Version control informations
File notes
We now have every sections of our program with a description of every
important ones. There are both address and size for each header sections. Now,
we'll have a look at what contain these sections. For this, I'll use GDB.
Let's start with the .init header section:
----[root@Seldon prog]# readelf -x 1 varinmem
Hex dump of section '.init':
0x080480b4 0000dbe8 90000000 45e808ec 83e58955 U......E........
0x080480c4
c3c90004 61b6e800 ...a....
----The readelf program show to us the Hex dumps of the .init section of our varinmem
program. Okay, these informations are extract directly from the binary file. But,
can we find what this hex segment hide? We'll use GDB to see if we can access to
the assembly code of the .init section at the start address: 0x080440b4.
----(gdb) file varinmem
Reading symbols from varinmem...done.
(gdb) disassemble 0x080480b4
Dump of assembler code for function _init:
0x80480b4 <_init>:
push
%ebp
0x80480b5 <_init+1>:
mov
%esp,%ebp
0x80480b7 <_init+3>:
sub
$0x8,%esp
0x80480ba <_init+6>:
call
0x8048104 <call_gmon_start>
0x80480bf <_init+11>:
nop
0x80480c0 <_init+12>:
call
0x80481a0 <frame_dummy>
0x80480c5 <_init+17>:
call
0x808e280 <__do_global_ctors_aux>
0x80480ca <_init+22>:
leave
0x80480cb <_init+23>:
ret
End of assembler dump.
----We got it! the .init header segment is composed of the _init function. Don't
forgot, this is not the CRTSO, this routine will initialise our program and not
start him! In this case, in which segment is situated our _start procedure?
----(gdb) disassemble _start
Dump of assembler code for function _start:
0x80480e0 <_start>:
xor
%ebp,%ebp
...
0x8048102 <_start+34>: mov
%esi,%esi
End of assembler dump.
----The _start function code start at address 0x80480e0. Now, come back at our
header sections ( Table 2 ) and see where is situated this address:
----[Nr] Name
[ 2] .text
Type
PROGBITS
Addr
Off
Size
ES Flg Lk Inf Al
080480e0 0000e0 046220 00 AX 0
0 32
----Humm, the starting address is the same! We got it! The CRTSO procedure is the
first procedure of the .text section!
Now, we'll look at the .data segment. There is supposed to be all initialised
data in the process image.
----[root@Seldon prog]# readelf -x 8 varinmem
Hex dump of section '.data':
0x0809ce00 00000000 0809ed6c 00000000 00000000 ........l.......
0x0809ce10 00000000 00000000 0809edc0 00000001 ................
...
0x0809e010 00000000 0000003f 3f783020 65707974 type 0x??.......
----(gdb) disassemble 0x0809ce10
Dump of assembler code for function iGlobalInit:
0x809ce10 <iGlobalInit>:
add
%eax,(%eax)
0x809ce12 <iGlobalInit+2>:
add
%al,(%eax)
End of assembler dump.
----If we disassemble the first address of the .data section we'll see the
data_start procedure. But, if we continue with the next address ( 0x809ce12 )
we'll see an interesting thing! Is there our iGlobalInit variable initialised
to 1 ( see memorylayout.c )? Yes it is! We just found where are put every
initialised global variable of our program! Okay, but, where are ours iGlobalUnInit
and szGlobalP variable? They are global no?
..Tap...Tap...Tap...Tap...Arg...Tap...Tap...cant find it...Tap...Tap...
many many time later...
Tap...Tap...Tap........
Hummmmm hard to find no? There is another feature of GDB which will help you in
your researches:
----(gdb) file varinmem
Reading symbols from varinmem...done.
(gdb) list 1
1
int iGlobalInit = 1;
2
int iGlobalUnInit;
3
char *szGlobalP;
----(gdb) print &szGlobalP
$1 = (char **) 0x809f7a4
----(gdb) info symbol 0x809f7a4
szGlobalP in section .bss
-----
(gdb) disassemble 0x809f7a4
Dump of assembler code for function szGlobalP:
0x809f7a4 <szGlobalP>: add
%al,(%eax)
0x809f7a6 <szGlobalP+2>:
add
%al,(%eax)
End of assembler dump.
----(gdb) disassemble 0x809f7a4-1
Dump of assembler code for function iGlobalUnInit:
0x809f7a0 <iGlobalUnInit>:
add
%al,(%eax)
0x809f7a2 <iGlobalUnInit+2>:
add
%al,(%eax)
End of assembler dump.
----We got it guys! First, I listed the source code of our program to see my
global variables. After I used the print GDB command with the "&" symbol to get
the memory reference of my szGlobalP pointer in the program (Use the "help print"
command for more information about the print command). After, I used the "info
symbol" command to know if my variable is present in a symbol. I found that
szGlobalP is present in the .bss header section. After I disassembled the code at
the szGlobalP address to see if there is anything. I found the declaration of my
szGlobalP pointer! After, I was curious to know what's declared before this address.
It's why I disassembled the address before the beginning of the szGlobalP pointer's
address. What a surprise, I found my iGlobalUnInit uninitialised variable declaration!
Okay, I think it's the time to jump in the next chapter.
In conclusion, if you wish to discover other header sections, don't hesitate
and continue with the techniques above. Use the "readelf" program to find where
every segment starts and stop and after open your program ( the program need to be
compiled with debug information for GDB ) with GDB and disassemble addresses of
segments! I'm sure that you'll find many interesting things by scrounging the code
like this!
Chapter 3: The Stack and the Heap:
---------------------------------Now have two problems. Local variables and dynamic variables. The solution
form these two problems are the two parts of the user stack frame ( You mind the gap? ).
The stack and the heap fields.
Why initialised local variables and unitialised ones aren't respectively
addressed in the .data and .bss at the compilation as global variables? It's for the
same reason why Linux don't load every procedure in memory at the program execution
(see chapter 1.3 - EXEC step 4 - note 1)! If you have a heavy program to execute with
many hundreds of procedures, each with many local variables and, you only use 10 or 20
of these procedures. In this case there will be many thousands of initialised variables,
which will never be used during the program execution. Do you imagine the important
lost of memory space just for one executed program? Report this for 5 or 10 running
programs on the workstation and this will going crazy! This is the main reason why the
compilator don't allocate the memory for these local variables. In this case, where are
they addressed? It's the topic of this chapter. We'll see how functions (with his
arguments, local variables) and dynamic variables are mapped in memory.
I'll first explain something about pointers and dynamic variables. In fact, a
pointer represent a 32bits address in the memory on a typical IA-32 PC workstation.
In this case, a dynamic variable represent the memory zone targeted by a pointer, and,
the pointer is the address!
I'll show you how pointers and dynamic variables work by the following
demonstration with GDB:
----(gdb) list
8
int iLocalUnInit;
9
char szLocalP[12] = "Hello World!";
10
11
szGlobalP = (char*)malloc( 12 * sizeof(char));
12
strncpy(szGlobalP,"Hello World!",12);
13
}
(gdb) break 13
Breakpoint 1 at 0x8048231: file memorylayout.c, line 13.
----I first listed my source lines to know at which one I need to put my breakpoint
to have my dynamic variable initialised. After I putted a breakpoint at line 13.
----(gdb) run
Starting program: /root/prog/varinmem
Breakpoint 1, function (cArgument=0 '\000') at varinmem.c:13
13
}
----After I executed the program and he have stopped at the breakpoint.
----(gdb) print szGlobalP
$1 = 0x809ff88 "Hello World!"
----Okay, I have my "Hello World!" string at address 0x809ff88. Now I'll check to know
where is declared this memory zone ( a variable? ):
----(gdb) info symbol 0x809ff88
No symbol matches 0x809ff88.
----Oups, there is no defined symbol at this address!
----(gdb) print &szGlobalP
$2 = (char **) 0x809f7a4
----Okay, I found my pointer address.
----(gdb) info symbol 0x809f7a4
szGlobalP in section .bss
----My pointer is always in my .bss segment. Now we know that the 0x809ff88 address is the
cluster where the "Hello World!" string is. It's why this is not a defined symbol.
It's why we'll say that the dynamic variable is the memory zone targeted by a pointer,
and, the pointer is this address!
Note 1: As you'll se in the section 3.5, these memory clusters are allocated in the heap
section of the user stack frame.
3.1: Where are they?
-------------------Has you know, local and dynamic variables are situated in a reserved memory
zone called the "user stack frame". This zone is dynamically administrated. In fact,
some parts of the zone will be created and suppressed at the top (the stack) and at the
button (the heap) every time that a function is finished or the content of a dynamic
variable change. The environment of a called function, including parameters and local
variables, is created in the stack part of the USF. At the opposite of the USF zone,
dynamic variables are created in the heap section.
3.2: How to know what's the size of the user stack frame at compilation?
-----------------------------------------------------------------------If you are using a processor newer then the 8088 ( this is probably the case )
there is a trap system to prevent stack overflow in hardware. In fact, the program will
allocate a certain amount of space for the user stack frame but, if the stack try to
grow beyond this amount, a trap to the operating system will occur, and the operating
system will try to allocate another memory zone for the stack, if possible.
3.3: Registers:
--------------Before jumping in the explanation of the stack and the heap, I'll enumerate and
briefly explain you them functions.
A register is a part of a processor which the only utility is to hold many type
of values. They are a direct link between the processor and the memory.
3.3.1: General Registers:
----------------------These registers can be used to hold and manipulate data but some of them are
specialised for some task.
A 32bit general registers representation:
--------------------------------------------------------------------------------Low
|
High
|
|
|
|
|
8
|
8
|
|
Bits
|
Bits
|
|
|
|
|
-----------------------------------------^
^
^
|
First 16 bits
| Extended 16 Bits |
|
|
|
|
|
Figure 1.6
%EBX representation example:
---------------------------|
-----------------------------------------%BL
|
%BH
|
|
|------------------------------------------|
|
%BX
|
|
|------------------------------------------|
|
%EBX
|
-----------------------------------------Figure 3.0
There was a revolution in the home computing technologies with the arrival of
the new Intel 386 with his 32 bits microprocessor. Old 16 bit processors had general
registers with a length of 16 bits ( %BX ). These 16 bits was composed of two 8 bits
subdivision. There were the low 8 bits ( %BL ) and the high 8 bits ( %BH ). But, with
the new arrival of this monster of speed, general registers was extended with a new 16
bits and the whole register was now called %EBX. The "E" stands for "Extended."
%EAX: No specialisation.
%EBX: Specialized for the index addressing management.
%ECX: Specialized for loops management.
EX:
MOV %ECX,10
EXLOOP: ADD %EAX,10
LOOP EXLOOP
;ECX = ECX-1; The loop will stop
when %EXC == 0
%EDX: Specialized in multiplication/division of unsigned numbers.
3.3.2: Segments registers:
-------------------------%CS: This is the code segment. This is a reference to the executable code of the
running application. His value can be changed by the CALL, JMP, Jxx or
POP %CS instruction.
%SS: This is the stack segment. This segment is associate with the SP and BP
segments. This register will stock temporary all data of the microprocessor
in the case of function call.
%ES: This is the extra segment. He is exploited by the processor for strings
management. In this case, ES and DI will target the destination address.
%DS: This is the data segment linked with all other segment except SP, BP and IP.
%FS: Same as ES.
%GS: Same as ES.
3.3.3: Offset Registers:
-----------------------%ESP: Extended Stack Pointer, this is the top of our stack.
%EBP: Extended Base pointer, will target the start of the local environment of a
function.
%EDI: Extended Destination Index, hold the offset in a operation using a memory block.
%ESI: Extended Source Index, will target the beginning of the memory block when an
operation use it.
%EIP: Extended Instruction Pointer, target the address of the next instruction to execute.
3.4: The stack:
--------------Every times that a function is called, we'll need to create a new environment
for him in the stack. We'll create a place to push parameters and local variables
values. In reality, the spirit of the function is this small part of the stack where
values are hold and changed. The rest of the function (all instructions) is in the
.text header section.
I'll use the code bellow to explain every aspects of the stack and heap
management when a function, a pointer or a dynamic variable is called.
funcinmem.c
----------int iGlobalInit = 1;
int iGlobalUnInit;
char *szGlobalP;
void function(char *cParameter, int iParameter)
{
int iLocalInit = 1;
int iLocalUnInit;
char szHelloString[12] = "Hello World!";
char *szLocalP;
iParameter = 5;
iLocalUnInit = iParameter;
iGlobalUnInit = iLocalUnInit+1;
szGlobalP = (char*)malloc( 12 * sizeof(char));
strncpy(szGlobalP,"Hello World!",12);
szLocalP = (char*)malloc( 12 * sizeof(char));
strncpy(szLocalP,"Hello World!",12);
return;
}
int main(int argc, char **argv)
{
int iMainLocalInit = 2;
function("test",0);
iMainLocalInit += 1;
iGlobalInit += iMainLocalInit;
printf("iMainLocalInit = %d\n",iMainLocalInit);
}
------
3.4.2: Stack management when calling a procedure:
------------------------------------------------A function is devised in three principal parts:
1- The function call: All parameters are push on the stack and the instruction
pointer ( IP ) is saved to continue instruction processing after our function
call.
2- The Prologue: At the function starting, we'll save the state of the stack
as appeared before the function starting. After, we'll reserve the good among
of memory for our further function call.
3- The function return: Putting everything as appeared before the function call.
Now, let us disassemble our main() and function() procedures.
Note 1: I'll refer to these listening all the time in this chapter. Don't
hesitate
to look at them.
-----(gdb) disassemble main
Dump of assembler code for function main:
0x8048270 <main>:
push
%ebp
0x8048271 <main+1>:
mov
%esp,%ebp
0x8048273 <main+3>:
sub
$0x8,%esp
0x8048276 <main+6>:
movl
$0x2,0xfffffffc(%ebp)
0x804827d <main+13>:
sub
$0x8,%esp
0x8048280 <main+16>:
push
$0x0
0x8048282 <main+18>:
push
$0x808e3d5
0x8048287 <main+23>:
call
0x80481e0 <function>
0x804828c <main+28>:
add
$0x10,%esp
0x804828f <main+31>:
lea
0xfffffffc(%ebp),%eax
0x8048292 <main+34>:
incl
(%eax)
0x8048294 <main+36>:
mov
0xfffffffc(%ebp),%eax
0x8048297 <main+39>:
add
%eax,0x809cef0
0x804829d <main+45>:
sub
$0x8,%esp
0x80482a0 <main+48>:
pushl 0xfffffffc(%ebp)
0x80482a3 <main+51>:
push
$0x808e3da
0x80482a8 <main+56>:
call
0x804872c <printf>
0x80482ad <main+61>:
add
$0x10,%esp
0x80482b0 <main+64>:
leave
0x80482b1 <main+65>:
ret
End of assembler dump.
----------(gdb) file funcinmem
Reading symbols from funcinmem...done.
(gdb) disassemble function
Dump of assembler code for function function:
0x80481e0 <function>:
push
%ebp
0x80481e1 <function+1>: mov
%esp,%ebp
0x80481e3 <function+3>: push
%edi
0x80481e4 <function+4>: push
%esi
0x80481e5 <function+5>: sub
$0x30,%esp
0x80481e8 <function+8>: movl
$0x1,0xfffffff4(%ebp)
0x80481ef <function+15>:
lea
0xffffffd8(%ebp),%edi
0x80481f2 <function+18>:
mov
$0x808e3c8,%esi
0x80481f7 <function+23>:
cld
0x80481f8 <function+24>:
mov
$0x3,%ecx
0x80481fd <function+29>:
repz movsl %ds:(%esi),%es:(%edi)
0x80481ff <function+31>:
movl
$0x5,0xc(%ebp)
0x8048206 <function+38>:
mov
0xc(%ebp),%eax
0x8048209 <function+41>:
mov
%eax,0xfffffff0(%ebp)
0x804820c <function+44>:
mov
0xfffffff0(%ebp),%eax
0x804820f <function+47>:
inc
%eax
0x8048210 <function+48>:
mov
%eax,0x809f8c4
0x8048215 <function+53>:
0x8048218 <function+56>:
0x804821a <function+58>:
0x804821f <function+63>:
0x8048222 <function+66>:
0x8048224 <function+68>:
0x8048229 <function+73>:
0x804822c <function+76>:
0x804822e <function+78>:
0x8048233 <function+83>:
0x8048239 <function+89>:
0x804823e <function+94>:
0x8048241 <function+97>:
0x8048244 <function+100>:
0x8048246 <function+102>:
0x804824b <function+107>:
0x804824e <function+110>:
0x8048250 <function+112>:
0x8048253 <function+115>:
0x8048256 <function+118>:
0x8048258 <function+120>:
0x804825d <function+125>:
0x8048260 <function+128>:
0x8048265 <function+133>:
0x8048268 <function+136>:
0x804826b <function+139>:
0x804826c <function+140>:
0x804826d <function+141>:
0x804826e <function+142>:
End of assembler dump.
sub
push
call
add
mov
mov
sub
push
push
pushl
call
add
sub
push
call
add
mov
mov
sub
push
push
pushl
call
add
lea
pop
pop
pop
ret
$0xc,%esp
$0xc
0x8048d78 <__libc_malloc>
$0x10,%esp
%eax,%eax
%eax,0x809f8c8
$0x4,%esp
$0xc
$0x808e3c8
0x809f8c8
0x804cbdc <strncpy>
$0x10,%esp
$0xc,%esp
$0xc
0x8048d78 <__libc_malloc>
$0x10,%esp
%eax,%eax
%eax,0xffffffd4(%ebp)
$0x4,%esp
$0xc
$0x808e3c8
0xffffffd4(%ebp)
0x804cbdc <strncpy>
$0x10,%esp
0xfffffff8(%ebp),%esp
%esi
%edi
%ebp
------
3.4.2.1: The call:
-----------------This is the assembly code of our function call:
----------------------------------------------0x8048280 <main+16>:
0x8048282 <main+18>:
0x8048287 <main+23>:
push
push
call
$0x0
$0x808e3d5
0x80481e0 <function>
-----The function call procedure will pass all arguments to the called function
and will save the address memory (%EIP) of where the function was called to continue
the normal program execution after the called function return.
Okay, let's GDB rocks:
-----(gdb) break *0x8048280
Breakpoint 1 at 0x8048280: file funcinmem.c, line 30.
(gdb) run
Starting program: /root/prog/funcinmem
Breakpoint 1, 0x08048280 in main (argc=1, argv=0xbffffa84) at
funcinmem.c:30
warning: Source file is more recent than executable.
30
function("test",0);
(gdb) info register esp
esp
0xbffffa08
(gdb) stepi
0x08048282
30
0xbffffa08
function("test",0);
(gdb) info register esp
esp
0xbffffa04
(gdb) stepi
0x08048287
30
0xbffffa04
function("test",0);
(gdb) info register esp
esp
0xbffffa00
0xbffffa00
-----First, I putted a breakpoint at the "push $0x0" command to stop GDB before
the execution of this assembly command to have the time to get the state of our
register. After, I ran our "funcinmem" program.
The two "push" commands before the "call" command will put our arguments on
the stack. The %ESP register will then be moved by 2 dwords downward the
stack.
|
|
|
|
|
|
|---------------|
X |
| <-- %ebp
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
0
|
|---------------|
|
test
| <-- %esp
|---------------|
|
|
|
|
|
|
-----Figure 3.1
|
|
|
|
|
|
|---------------|
0xbffffa18|
| <-- %ebp
|---------------|
|
|
|
|
|
|
|---------------|
0xbffffa08|
|
|---------------|
0xbffffa04|
0x0
|
|---------------|
0xbffffa00|
0x808e3d5
| <-- %esp
|---------------|
|
|
|
|
|
|
-----(gdb) x/4c 0x808e3d5
0x808e3d5 <_IO_stdin_used+17>:
116 't' 101 'e' 115 's' 116 't'
(gdb) info symbol 0x808e3d5
_IO_stdin_used + 17 in section .rodata
------
I was curious about this address in memory, it's why I examined the memory
at this
address and I confirmed that there was the place of the "test" argument
string holed
in memory.
I also found that this string is in the .rodata (read only data) header
section of the ELF executable file format!
-----(gdb) stepi
function (cParameter=0x1 <Address 0x1 out of bounds>, iParameter=1073743228) at funcinmem.c:6
6
{
(gdb) info register esp
esp
0xbffff9fc
0xbffff9fc
(gdb) info register eip
eip
0x80481e0
0x80481e0
(gdb) info register esp
esp
0xbffff9fc
0xbffff9fc
(gdb) x 0xbffff9fc
0xbffff9fc:
0x0804828c
(gdb) x 0x0804828c
0x804828c <main+28>:
0x8d10c483
(gdb) disassemble main+28
...
0x8048287 <main+23>:
call
0x80481e0 <function>
0x804828c <main+28>:
add
$0x10,%esp
...
-----Now, we executed the "call 0x80481e0 <function>" command. %EIP was pushed on the
stack to continue the normal execution of our program after the "function" procedure
return. After, the first command of our procedure (0x80481e0 (push %ebp)) was put
in %EIP. After, this command was executed by him.
%ESP was moved by another dword downward the memory, why? I examined the memory
address of our %ESP register at address 0xbffff9fc. I found that this address point
on 0x8d10c483 memory address. I finally disassembled this memory to find that this
address is the address of the "add $0x10,%esp" command of our main function. We
got it! This is the address of the next command to execute after the return of our
"function" procedure! The %EIP register was explicitly pushed on the stack by the
"call" command before being assigned on 0x80481e0.
|
|
|
|
|
|
|---------------|
X |
| <-- %ebp
|
|
|
|
|
|
|---------------|
0xbffffa18|
| <--
%ebp
|---------------|
|
|
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
0
|
|---------------|
|
test
|
|---------------|
|
Z
| <-- %esp
|
|
|
|
|---------------|
0xbffffa08|
|
|---------------|
0xbffffa04|
0x0
|
|---------------|
0xbffffa00|
0x808e3d5
|
|---------------|
0xbffff9fc|
0x0804828c | <--
%esp
|---------------|
|
|
|
|
|---------------|
|
|
|
|
Z = The address to pop after the procedure call to
continue the normal execution of the program.
-----Figure 3.2
------
3.4.2.2: The Prologue:
----------------------
This is the assembly code of our prologue:
-----------------------------------------0x80481e0
0x80481e1
0x80481e3
0x80481e4
0x80481e5
<function>:
<function+1>:
<function+3>:
<function+4>:
<function+5>:
push
mov
push
push
sub
%ebp
%esp,%ebp
%edi
%esi
$0x30,%esp
-----Okay, let's GDB rocks:
-----(gdb) break *0x80481e0
Breakpoint 1 at 0x80481e0: file funcinmem.c, line 6.
(gdb) run
Starting program: /root/prog/funcinmem
Breakpoint 1, function (cParameter=0x1 <Address 0x1 out of bounds>, iParameter=1073743228) at funcinmem.c:6
6
{
-----First, I putted a breakpoint on the first line of our prologue assembly code and
I started the program. The program execution stopped at our first breakpoint,
the "push
%ebp" command (Remember, the push command isn't called at this
moment, the breakpoint is called before the push command).
-----(gdb) info register ebp
ebp
0xbffffa18
0xbffffa18
(gdb) info register esp
esp
0xbffff9fc
0xbffff9fc
-----There our %EBP register point on a X memory address and %ESP at a Y memory address.
-----|
|
|
|
|
|
|---------------|
X |
| <-- %ebp
|---------------|
|
|
|
|
|
|
|---------------|
Y |
| <-- %esp
|---------------|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------------|
0xbffffa18|
| <-- %ebp
|---------------|
|
|
|
|
|
|
|---------------|
0xbffff9fc|
| <-- %esp
|---------------|
|
|
|
|
|
|
|
|
|
|
-----Figure 3.3
-----(gdb) stepi
0x080481e1
6
{
(gdb) info register ebp
ebp
0xbffffa18
0xbffffa18
(gdb) info register esp
esp
0xbffff9f8
0xbffff9f8
-----We executed the last command (push
%ebp) with the "stepi" GDB command. We'll
save the current environment by pushing on the stack the current %EBP position.
Then, the %ESP register will decrease of one dword because we pushed %EBP on the
stack. This is why you can see a change of the %ESP register location after the
"push
%ebp" command call:
0xbffff9fc
0xbffff9f8
---------0x00000004
( A dword )
-----|
|
|
|
|
|
|---------------|
X |
| <-- %ebp
|---------------|
|
|
|
|
|
|
|
|
|
|
|---------------|
0xbffffa18|
| <-- %ebp
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
X
| <-- %esp
|---------------|
|
|
|
|
|
|
|
|
|---------------|
0xbffff9fc|
|
|---------------|
0xbffff9f8| 0xbffffa18
| <-- %esp
|---------------|
|
|
|
|
|
|
-----Figure 3.4
-----(gdb) stepi
0x080481e3 in function (cParameter=0x808e3d5 "test", iParameter=0) at funcinmem.c:6
6
{
(gdb) info register ebp
ebp
0xbffff9f8
0xbffff9f8
(gdb) info register esp
esp
0xbffff9f8
0xbffff9f8
-----There we executed the "mov
%esp,%ebp" assembly command by calling the "stepi"
GDB command. This will move the %EBP register on the %ESP one. This will create
a new environment for our called procedure. Both %ESP and %EBP point on the same
old environment address.
-----|
|
|
|
|
|
|---------------|
X |
|
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
X
| <-- %esp
|---------------|
|-%ebp
|
|
|
|
|
|
|
|
|
|
|
|
|---------------|
0xbffffa18|
|
|---------------|
|
|
|
|
|
|
|---------------|
0xbffff9fc|
|
|---------------|
0xbffff9f8| 0xbffffa18
| <-- %esp
|---------------|
|-%ebp
|
|
|
|
|
|
-----Figure 3.5
-----(gdb) stepi
0x080481e4
6
{
(gdb) info register esp
esp
0xbffff9f4
(gdb) stepi
0x080481e5
6
(gdb) info register esp
{
0xbffff9f4
esp
0xbffff9f0
0xbffff9f0
-----There, we save the %EDI and %ESI state by pushing them on the stack.
|
|
|
|
|
|
|---------------|
X |
|
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
X
| <-- %ebp
|---------------|
|
V
|
|---------------|
|
W
| <-- %esp
|---------------|
|
|
|
|
|
|
|
|
------
|
|
|
|
|
|
|---------------|
0xbffffa18|
|
|---------------|
|
|
|
|
|
|
|---------------|
0xbffff9fc|
|
|---------------|
0xbffff9f8| 0xbffffa18
| <-- %ebp
|---------------|
0xbffff9f4|
0x1
|
|---------------|
0xbffff9f0| 0xbffffa84
| <-- %esp
|---------------|
|
|
|
|
|
|
|
|
V = %EDI
W = %ESI
Figure 3.6
-----Bellow is the stack layout after all local and dynamic variables declarations
and alignment by the compiler by padding the stack frame with null value dwords.
(gdb) break *0x8048265
Breakpoint 3 at 0x8048265
(gdb) run
Starting program: /root/prog/funcinmem
Breakpoint 1, 0x08048265 in function (cParameter=0x808e3d5 "test", iParameter=5) at
funcinmem.c:21
21
strncpy(szLocalP,"Hello World!",12);
(gdb) info register ebp
ebp
0xbffff9f8
0xbffff9f8
(gdb) info register esp
esp
0xbffff9b0
0xbffff9b0
(gdb) x 0xbffff9f8
0xbffff9f8:
0xbffffa18
-----The saved environment of our main() function
-----(gdb) x 0xbffff9f4
0xbffff9f4:
0x00000001
-----Saved %EDI
-----(gdb) x 0xbffff9f0
0xbffff9f0:
0xbffffa84
-----Saved %ESI
-----(gdb) x 0xbffff9ec
0xbffff9ec:
0x00000001
-----iLocalInit
-----(gdb) x 0xbffff9e8
0xbffff9e8:
0x00000005
-----iLocalUnInit = iParameter;
-----(gdb) x 0xbffff9e4
0xbffff9e4:
0x00000000
-----Padding
-----(gdb) x 0xbffff9e0
0xbffff9e0:
0x00000000
-----Padding
-----(gdb) x 0xbffff9dc
0xbffff9dc:
0x00000000
-----Padding
-----(gdb) x 0xbffff9d8
0xbffff9d8:
0x21646c72
-----szHelloString ( First DWord )
-----(gdb) x 0xbffff9d4
0xbffff9d4:
0x6f57206f
-----szHelloString ( Second DWord )
-----(gdb) x 0xbffff9d0
0xbffff9d0:
0x6c6c6548
-----szHelloString ( Last DWord )
-----(gdb) x 0xbffff9cc
0xbffff9cc:
0x080a00d8
-----szLocalP
-----(gdb) x 0xbffff9c8
0xbffff9c8:
0x00000000
-----Padding
-----(gdb) x 0xbffff9c4
0xbffff9c4:
0x00000000
-----Padding
-----(gdb) x 0xbffff9c0
0xbffff9c0:
0x00000000
-----Padding
------
Above is the stack mapping after all local and dynamic variables declarations
and alignment by the compiler by padding the stack frame with null value dwords.
0x80481e5 <function+5>: sub
$0x30,%esp
By this command, we'll reserve 12 dwords on the stack to put our local variables.
Don't forgot, the stack is managed in dword ( 4 bytes or 32 bits ). If the dword
isn't full, the 0x0 value is assigned for unused bytes.
|
|
|
|
|
|
|---------------|
X |
|
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
X
| <-- %ebp
|---------------|
|
V
|
|---------------|
|
W
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
|
|---------------|
|
Allocated
| <-- %esp
|---------------|
|
|
|
|
|
|
|---------------|
0xbffffa18|
|
|---------------|
|
|
|
|
|
|
|---------------|
0xbffff9fc|
|
|---------------|
0xbffff9f8| 0xbffffa18
| <-- %ebp
|---------------|
0xbffff9f4|
0x1
|
|---------------|
0xbffff9f0| 0xbffffa84
|
|---------------|
0xbffff9ec|
|
|---------------|
0xbffff9e8|
|
|---------------|
0xbffff9e4|
|
|---------------|
0xbffff9e0|
|
|---------------|
0xbffff9dc|
|
|---------------|
0xbffff9d8|
|
|---------------|
0xbffff9d4|
|
|---------------|
0xbffff9d0|
|
|---------------|
0xbffff9cc|
|
|---------------|
0xbffff9c8|
|
|---------------|
0xbffff9c4|
|
|---------------|
0xbffff9c0|
| <-- %esp
|---------------|
V = %EDI
W = %ESI
-----Figure 3.7
After, the memory allocation, we'll put the value of our variables
in them. The memory now looks like that:
|
|
|
|
|
|
|---------------|
X |
|
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
X
| <-- %ebp
|---------------|
|
V
|
|---------------|
|
W
|
|---------------|
|
iLocalInit |
|---------------|
| iLocalUnInit |
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
| szHelloString |
|---------------|
| szHelloString |
|---------------|
| szHelloString |
|---------------|
|
szLocalP
|
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
| <-- %esp
|---------------|
|
|
|
|
|
|
|---------------|
0xbffffa18|
|
|---------------|
|
|
|
|
|
|
|---------------|
0xbffff9fc|
|
|---------------|
0xbffff9f8| 0xbffffa18
|
|---------------|
0xbffff9f4|
0x1
|
|---------------|
0xbffff9f0| 0xbffffa84
|
|---------------|
0xbffff9ec| 0x00000001
|
|---------------|
0xbffff9e8| 0x00000005
|
|---------------|
0xbffff9e4| 0x00000000
|
|---------------|
0xbffff9e0| 0x00000000
|
|---------------|
0xbffff9dc| 0x00000000
|
|---------------|
0xbffff9d8| 0x21646c72
|
|---------------|
0xbffff9d4| 0x6f57206f
|
|---------------|
0xbffff9d0| 0x6c6c6548
|
|---------------|
0xbffff9cc| 0x080a00d8
|
|---------------|
0xbffff9c8| 0x00000000
|
|---------------|
0xbffff9c4| 0x00000000
|
|---------------|
0xbffff9c0| 0x00000000
|
|---------------|
<-- %ebp
"!dlr"
"oW o"
"lleH"
<-- %esp
V = %EDI
W = %ESI
-----Figure 3.8
Okay, there we need some explications. We have these local variables:
-----int iLocalInit = 1;
/* Will take 1 dword in the stack
int
/* Will take 1 dword in the stack
*/
iLocalUnInit;
*/
char szHelloString[12] = "Hello World!";
/* Will take 4 dwords in the stack
In reality, 3 dwords + 1 byte
but,
don't forgot, the stack is
devised
in dwords not in bytes.
*/
char *szLocalP;
*/
------
/* Will take 1 dword in the stack
In this case, %ESP is just supposed to be decrease by 7 dwords not 12!
Yeah in theory this is the case but in reality this isn't. The stack allocation
length will vary compiler-to-compiler, operating_system-to-operating_system, and
architecture-to-architecture. In fact, the compiler will pad the stack frame for
a proper internal alignment. This is why we have 5 dwords padded with 0x00000000.
By example: if in stdint.h a signed integer is defined as 16 bits
( int16_t ) and not 32 bits ( int32_t ), you'll be able to put 2 signed integer
in a dword and not just one like our example.
3.4.2.3: The return:
-------------------The return procedure will restore the environment present before our
function call. In our example, the environment of our main() procedure will be
restored with same values as before the function() procedure call.
There is the assembly code of the return procedure:
--------------------------------------------------0x8048268 <function+136>:
lea
0x804826b <function+139>:
pop
0x804826c <function+140>:
pop
0x804826d <function+141>:
pop
0x804826e <function+142>:
ret
------
0xfffffff8(%ebp),%esp
%esi
%edi
%ebp
Okay, let GDB rock another time!
-----(gdb) break *0x8048268
Breakpoint 1 at 0x8048268: file funcinmem.c, line 24.
(gdb) run
Starting program: /root/prog/funcinmem
Breakpoint 1, function (cParameter=0x808e3d5 "test", iParameter=5) at
funcinmem.c:24
warning: Source file is more recent than executable.
24
}
(gdb) info register ebp
ebp
0xbffff9f8
0xbffff9f8
(gdb) info register esp
esp
0xbffff9c0
0xbffff9c0
-----First, I putted a breakpoint at the "lea
0xfffffff8(%ebp),%esp" instruction.
After I checked the %ESP and %EBP register state before the execution of this
command.
-----(gdb) stepi
0x0804826b
24
}
(gdb) info register ebp
ebp
0xbffff9f8
0xbffff9f8
(gdb) info register esp
esp
0xbffff9f0
0xbffff9f0
-----I executed the "lea" command with the "stepi" gdb command. After I rechecked the
state of the %ESP and %EBP register. %ESP has changed. His value have increased
in the stack. Now, the stack looks like that:
|
|
|
|
|
|
|---------------|
X |
|
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
X
| <-- %ebp
|---------------|
|
V
|
|---------------|
|
W
| <-- %esp
|---------------|
|
iLocalInit |
|---------------|
| iLocalUnInit |
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
| szHelloString |
|---------------|
| szHelloString |
|---------------|
| szHelloString |
|---------------|
|
szLocalP
|
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
|
|
|
|
|
|
|---------------|
0xbffffa18|
|
|---------------|
|
|
|
|
|
|
|---------------|
0xbffff9fc|
|
|---------------|
0xbffff9f8| 0xbffffa18
|
|---------------|
0xbffff9f4|
0x1
|
|---------------|
0xbffff9f0| 0xbffffa84
|
|---------------|
0xbffff9ec| 0x00000001
|
|---------------|
0xbffff9e8| 0x00000005
|
|---------------|
0xbffff9e4| 0x00000000
|
|---------------|
0xbffff9e0| 0x00000000
|
|---------------|
0xbffff9dc| 0x00000000
|
|---------------|
0xbffff9d8| 0x21646c72
|
|---------------|
0xbffff9d4| 0x6f57206f
|
|---------------|
0xbffff9d0| 0x6c6c6548
|
|---------------|
0xbffff9cc| 0x080a00d8
|
|---------------|
0xbffff9c8| 0x00000000
|
|---------------|
0xbffff9c4| 0x00000000
|
|---------------|
0xbffff9c0| 0x00000000
|
|---------------|
<-- %ebp
<-- %esp
"!dlr"
"oW o"
"lloH"
V = %EDI
W = %ESI
-----Figure 3.9
Okay, We putted %ESP 12 dwords upward the stack! We recover the stack with
his old layout, same as before the memory allocation for the dynamic and
local variables.
-----(gdb) info register edi
edi
0xbffff9dc
-1073743396
(gdb) info register esi
esi
0x808e3d4
(gdb) info register esp
esp
0xbffff9f0
(gdb) stepi
0x0804826c
24
}
(gdb) info register edi
edi
0xbffff9dc
(gdb) info register esi
esi
0xbffffa84
(gdb) info register esp
esp
0xbffff9f4
(gdb) stepi
0x0804826d
24
}
(gdb) info register edi
edi
0x1
1
(gdb) info register esi
esi
0xbffffa84
(gdb) info register esp
esp
0xbffff9f8
134800340
0xbffff9f0
-1073743396
-1073743228
0xbffff9f4
-1073743228
0xbffff9f8
-----First, %ESI was popped from the stack, then, %ESP have increased by a dword
in the stack. After, %EDI was also popped from the stack, then, %ESP have
increased by another dword in the stack. So, %ESI and %EDI now have there
old value in the environment of the main() procedure before the function()
procedure call.
The stack now looks like that:
|
|
|
|
|
|
|---------------|
X |
|
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
X
| <-- %ebp
|---------------|
|%esp
|
V
|
|---------------|
|
W
|
|---------------|
|
iLocalInit |
|---------------|
| iLocalUnInit |
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
| szHelloString |
|---------------|
| szHelloString |
|---------------|
|
|
|
|
|
|
|---------------|
0xbffffa18|
|
|---------------|
|
|
|
|
|
|
|---------------|
0xbffff9fc|
|
|---------------|
0xbffff9f8| 0xbffffa18
| <-- %ebp
|---------------|
|%esp
0xbffff9f4|
0x1
|
|---------------|
0xbffff9f0| 0xbffffa84
|
|---------------|
0xbffff9ec| 0x00000001
|
|---------------|
0xbffff9e8| 0x00000005
|
|---------------|
0xbffff9e4| 0x00000000
|
|---------------|
0xbffff9e0| 0x00000000
|
|---------------|
0xbffff9dc| 0x00000000
|
|---------------|
0xbffff9d8| 0x21646c72
| "!dlr"
|---------------|
0xbffff9d4| 0x6f57206f
| "oW o"
|---------------|
| szHelloString |
|---------------|
|
szLocalP
|
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
|
[Padding]
|
|---------------|
0xbffff9d0| 0x6c6c6548
| "lloH"
|---------------|
0xbffff9cc| 0x080a00d8
|
|---------------|
0xbffff9c8| 0x00000000
|
|---------------|
0xbffff9c4| 0x00000000
|
|---------------|
0xbffff9c0| 0x00000000
|
|---------------|
V = %EDI
W = %ESI
-----Figure 3.10
-----(gdb) info register ebp
ebp
0xbffff9f8
(gdb) info register esp
esp
0xbffff9f8
0xbffff9f8
0xbffff9f8
(gdb) stepi
0x0804826e in function (cParameter=0x1 <Address 0x1 out of bounds>, iParameter=1073743228) at funcinmem.c:24
24
}
(gdb) info register ebp
ebp
0xbffffa18
(gdb) info register esp
esp
0xbffff9fc
0xbffffa18
0xbffff9fc
-----The %ESP and %EBP registers was at the same memory address. After %EBP was
popped from the stack. The result of this popping is that %EBP now point on
his old memory position. By this pop command, the %ESP register as increased
by 1 dwords in the memory and now he also point on his old memory address!
-----(gdb) info register esp
esp
0xbffff9fc
(gdb) info register eip
eip
0x804826e
0xbffff9fc
0x804826e
(gdb) stepi
0x0804828c in main (argc=1, argv=0xbffffa84) at funcinmem.c:30
30
function("test",0);
(gdb) info register esp
esp
0xbffffa00
(gdb) info register eip
eip
0x804828c
0xbffffa00
0x804828c
-----If you remember in the section 3.4.2.1, the call command have pushed %EIP
on the stack. So, the "ret" command will explicitly "pop" the %EIP register
of the stack. Then, %ESP is increased by another dwords in the memory stack
and he is now pointing on the old first parameter of the function() procedure.
Finally, %EIP is ready to execute the next command of the main() procedure.
|
|
|
|
|
|
|---------------|
X |
| <-- %ebp
|---------------|
|
|
|
|
|
|
|---------------|
Y |
|
|---------------|
|
0
|
|---------------|
|
test
|
|---------------|
|
Z
| <-- %esp
|---------------|
|
|
|
|
-----Figure 3.11
|
|
|
|
|
|
|---------------|
0xbffffa18|
| <-- %ebp
|---------------|
|
|
|
|
|
|
|---------------|
0xbffffa08|
|
|---------------|
0xbffffa04|
0x0
|
|---------------|
0xbffffa00|
0x808e3d5
| <-- %esp
|---------------|
0xbffff9fc|
0x0804828c |
|---------------|
|
|
|
|
3.5: The Heap:
-------------The heap is a memory zone dynamically allocated by an application. As we
know, global uninitialised variable are defined in the .bss ELF header section.
In counter part, local uninitialised variables are defined in the stack ( as we
saw above ). The heap will come up if the length of the memory zone to allocate
isn't known at the compilation. In fact, the length of an integer is know at the
compilation. Depending of his definition in stdint.h, in a x86 architecture the
length of the integer will be 32bits. So, the value of a global uninitilised integer
will be store in the .bss segment and in the case of a local uninitialised integer,
the value will be stored in the stack. But, this is not the case for a dynamic
variable. Remember the definition of a dynamic variable: a dynamic variable is the
memory zone targeted by a pointer. If we put these two definitions together, dynamic
variables will be put in the heap, because the length of a dynamic variable can
change at anytime during the program execution and, the heap, is structured to
allocate dynamically the memory at anytime in the heap. But don't forgot, the pointer
is the address where the memory block is in memory and, the length of a pointer is
know at the compilation ( 32bits ) in this case, a global and a local pointer will
always be defined respectively in the .bss and the stack ( like an integer ).
There is what the .bss section look like after our local and global variables
initialisations:
|
|
|
Heap
|
0x809fe04|---------------|
|
|
///
///
|
|
|---------------|
0x809f8c8|
szGlobalP
|
|---------------|
0x809f8c4| iGlobalUnInit |
|---------------|
|
|
///
///
|
|
0x809ee80|---------------|
|
Data
|
<---|
|
|
| BSS
| Segment
|
|
|
|
<----
|
-----Figure 3.12
|
Our two global uninitialised variables are now in the .bss header section.
But now, we want to know what the heap will look like after the
initialisation of our dynamic variable in the function() procedure. Yes,
let GDB rock another time:
Before continuing, you need to have in mind that the heap will grow upward
the user stack frame and not downward like the stack.
-----(gdb) info symbol &szGlobalP
szGlobalP in section .bss
(gdb) print &szGlobalP
$1 = (char **) 0x809f8c8
(gdb) print szGlobalP
$2 = 0x80a00c8 "Hello World!\021"
(gdb) x/4c
0x80a00c8:
(gdb) x/4c
0x80a00cc:
(gdb) x/4c
0x80a00d0:
(gdb) x/4c
0x80a00d4:
0x80a00c8
72 'H' 101 'e'
0x80a00cc
111 'o' 32 ' '
0x80a00d0
114 'r' 108 'l'
0x80a00d4
17 '\021'
108 'l' 108 'l'
87 'W'
111 'o'
100 'd' 33 '!'
0 '\000'
0 '\000'
0 '\000'
(gdb) info symbol &szLocalP
No symbol matches &szLocalP.
(gdb) print &szLocalP
$5 = (char **) 0xbffff9cc
(gdb) print szLocalP
$6 = 0x80a00d8 "Hello World!!\017"
(gdb) x/4c
0x80a00d8:
(gdb) x/4c
0x80a00dc:
(gdb) x/4c
0x80a00e0:
(gdb) x/4c
0x80a00e4:
0x80a00d8
72 'H'
0x80a00dc
111 'o'
0x80a00e0
114 'r'
0x80a00e4
33 '!'
101 'e' 108 'l' 108 'l'
32 ' '
87 'W'
111 'o'
108 'l' 100 'd' 33 '!'
15 '\017'
0 '\000'
0 '\000'
-----Note 1: There isn't any symbol defined for szLocalP because he is initialised
in the stack.
Finally, the heap look like this after the execution of the function()
procedure:
3
2
1
0
|
|
|
Stack
|
|---------------|
|
|
///
///
|
|
|---------------|
0x80a00e4|0x0|0x0|0x0|017| <--|-|---------------|
|
0x80a00e0| ! | d | l | r |
|
|---------------|
| szLocalP
0x80a00dc| o | W |\32| o |
|
|---------------|
|
0x80a00d8| l | l | e | H | <--|-|---------------|
0x80a00d4|0x0|0x0|0x0|021| <--|-|---------------|
|
0x80a00d0| ! | d | l | r |
|
|---------------|
| szGlobalP
0x80a00cc| o | W |\32| o |
|
|---------------|
|
0x80a00c8| l | l | e | H | <--|-|---------------|
|
|
///
///
|
|
0x809fe08|---------------|
|
BSS
|
|
|
-----Figure 3.13
The heap is referred as the breakpoint. If you need to share more dynamic
memory, you'll need to move the break point. A system call is used to tell to the
kernel that the application needs more dynamic memory. Then, the kernel will perform
some routines, after receiving the BRK system call, to know if he can move the
breakpoint upward in the user stack frame to allocate more memory. If the operation
is successful, the kernel will change every process table with the new information
and the heap size will increase in the user stack frame. If there isn't enough memory
for the allocation request by BRK, then the kernel will return -1 and the application
won't be able to allocation memory for the dynamic variable.
The Conclusion:
-------------------------Finally, I wish that you had the same fun to read it that I had to do it. I
think that there is many helpful sections not just to understand how a program is
mapped in memory and how he his executed but also to understand how basic concepts
of programming work in high level programming language. I also think that this method
by "dissection" is the most visual one to understand the concept.
Methods used above can easily be used to understand how a high level
programming language work, how a certain compiler work and how an architecture
work. For example, we can easily understand the impact of global variables on the
system by understanding his mapping in the memory. We also can understand how C
variables type work on the system by debugging the application and searching in which
symbol the variable is defined in the header and after, will know if the size is
static or dynamic, if the variable is global or local, etc.
Understanding how programs are mapped in memory is understanding how programs
really work.
If you have any question, comments, adding or error to report, please send
me an email at : fred@decatomb.com
I'm sorry for my bad English but, if you find grammar errors and are
willing to report them, please contact me and I'll change the text with pleasures.
================================================================================
================================================================================
Annex 1 :
--------do_fork() - Function Code - src/mm/forkexit.c
---------------------------------------------
/*===========================================================================*
*
do_fork
*
*===========================================================================*/
01 PUBLIC int do_fork()
02 {
03 /* The process pointed to by 'mp' has forked. Create a child process. */
04
05 register struct mproc *rmp;
/* pointer to parent */
06 register struct mproc *rmc;
/* pointer to child */
07 int i, child_nr, t;
08 phys_clicks prog_clicks, child_base = 0;
09 phys_bytes prog_bytes, parent_abs, child_abs; /* Intel only */
10
11 /* If tables might fill up during FORK, don't even start since recovery half
12 * way through is such a nuisance.
13 */
14 rmp = mp;
15 if (procs_in_use == NR_PROCS) return(EAGAIN);
16 if (procs_in_use >= NR_PROCS-LAST_FEW && rmp->mp_effuid != 0)return(EAGAIN);
17
18 /* Determine how much memory to allocate. Only the data and stack need to
19
* be copied, because the text segment is either shared or of zero length.
20
*/
21 prog_clicks = (phys_clicks) rmp->mp_seg[S].mem_len;
22 prog_clicks += (rmp->mp_seg[S].mem_vir - rmp->mp_seg[D].mem_vir);
23 #if (SHADOWING == 0)
24
prog_bytes = (phys_bytes) prog_clicks << CLICK_SHIFT;
25 #endif
26 if ( (child_base = alloc_mem(prog_clicks)) == NO_MEM) return(ENOMEM);
27
28 #if (SHADOWING == 0)
29
/* Create a copy of the parent's core image for the child. */
30
child_abs = (phys_bytes) child_base << CLICK_SHIFT;
31
parent_abs = (phys_bytes) rmp->mp_seg[D].mem_phys << CLICK_SHIFT;
32
i = sys_copy(ABS, 0, parent_abs, ABS, 0, child_abs, prog_bytes);
33
if (i < 0) panic("do_fork can't copy", i);
34 #endif
35
36 /* Find a slot in 'mproc' for the child process. A slot must exist. */
37 for (rmc = &mproc[0]; rmc < &mproc[NR_PROCS]; rmc++)
38
if ( (rmc->mp_flags & IN_USE) == 0) break;
39
40 /* Set up the child and its memory map; copy its 'mproc' slot from parent. */
41 child_nr = (int)(rmc - mproc);
/* slot number of the child */
42 procs_in_use++;
43 *rmc = *rmp;
/* copy parent's process slot to child's */
44
45 rmc->mp_parent = who;
/* record child's parent */
46 rmc->mp_flags &= ~TRACED;
/* child does not inherit trace status */
47 #if (SHADOWING == 0)
48
/* A separate I&D child keeps the parents text segment. The data and stack
49
* segments must refer to the new copy.
50
*/
51
if (!(rmc->mp_flags & SEPARATE)) rmc->mp_seg[T].mem_phys = child_base;
52
rmc->mp_seg[D].mem_phys = child_base;
53
rmc->mp_seg[S].mem_phys = rmc->mp_seg[D].mem_phys +
54
(rmp->mp_seg[S].mem_vir - rmp->mp_seg[D].mem_vir);
55 #endif
56 rmc->mp_exitstatus = 0;
57 rmc->mp_sigstatus = 0;
58
59 /* Find a free pid for the child and put it in the table. */
60 do {
61
t = 0;
/* 't' = 0 means pid still free */
62
next_pid = (next_pid < 30000 ? next_pid + 1 : INIT_PID + 1);
63
for (rmp = &mproc[0]; rmp < &mproc[NR_PROCS]; rmp++)
64
if (rmp->mp_pid == next_pid || rmp->mp_procgrp == next_pid) {
65
t = 1;
66
break;
67
}
68
rmc->mp_pid = next_pid; /* assign pid to child */
69 } while (t);
70 /* Tell kernel and file system about the (now successful) FORK. */
71 sys_fork(who, child_nr, rmc->mp_pid, child_base); /* child_base is 68K only*/
72 tell_fs(FORK, who, child_nr, rmc->mp_pid);
73
74
75
76
77
78
79
80
81
82
#if (SHADOWING == 0)
/* Report child's memory map to kernel. */
sys_newmap(child_nr, rmc->mp_seg);
#endif
/* Reply to child to wake it up. */
reply(child_nr, 0, 0, NIL_PTR);
return(next_pid);
/* child's pid */
}
------
Annex 2 :
--------Process Table definition - src/kernel/proc.h
-------------------------------------------struct proc {
struct stackframe_s p_reg;
#if (CHIP == INTEL)
reg_t p_ldt_sel;
struct segdesc_s p_ldt[2];
/* process' registers saved in stack frame */
/* selector in gdt giving ldt base and limit*/
/* local descriptors for code and data */
/* 2 is LDT_SIZE - avoid include protect.h */
#endif /* (CHIP == INTEL) */
#if (CHIP == M68000)
reg_t p_splow;
int p_trap;
#if (SHADOWING == 0)
char *p_crp;
#else
phys_clicks p_shadow;
int align;
#endif
int p_nflips;
char p_physio;
#if defined(FPP)
struct fsave p_fsave;
int align2;
#endif
#endif /* (CHIP == M68000) */
/* lowest observed stack value */
/* trap type (only low byte) */
/* mmu table pointer (really struct _rpr *) */
/* set if shadowed process image */
/* make the struct size a multiple of 4 */
/* statistics */
/* cannot be (un)shadowed now if set */
/* FPP state frame and registers */
/* make the struct size a multiple of 4 */
reg_t *p_stguard;
/* stack guard word */
int p_nr;
/* number of this process (for fast access) */
int p_int_blocked;
int p_int_held;
struct proc *p_nextheld;
/* nonzero if int msg blocked by busy task */
/* nonzero if int msg held by busy syscall */
/* next in chain of held-up int processes */
int p_flags;
/* P_SLOT_FREE, SENDING, RECEIVING, etc. */
struct mem_map p_map[NR_SEGS];/* memory map */
pid_t p_pid;
/* process id passed in from MM */
clock_t
clock_t
clock_t
clock_t
clock_t
user_time;
sys_time;
child_utime;
child_stime;
p_alarm;
/*
/*
/*
/*
/*
user time in ticks */
sys time in ticks */
cumulative user time of children */
cumulative sys time of children */
time of next alarm in ticks, or 0 */
struct proc *p_callerq;
struct proc *p_sendlink;
message *p_messbuf;
int p_getfrom;
int p_sendto;
/*
/*
/*
/*
head of list of procs wishing to send */
link to next proc wishing to send */
pointer to message buffer */
from whom does process want to receive? */
struct proc *p_nextready;
sigset_t p_pending;
unsigned p_pendcount;
/* pointer to next ready process */
/* bit map for pending signals */
/* count of pending and unfinished signals */
char p_name[16];
};
/* name of the process */
-----Annex 3:
--------
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
/*===========================================================================*
*
do_exec
*
*===========================================================================*/
PUBLIC int do_exec()
{
/* Perform the execve(name, argv, envp) call. The user library builds a
* complete stack image, including pointers, args, environ, etc. The stack
* is copied to a buffer inside MM, and then to the new core image.
*/
register struct mproc *rmp;
struct mproc *sh_mp;
int m, r, fd, ft, sn;
static char mbuf[ARG_MAX];
/* buffer for stack and zeroes */
static char name_buf[PATH_MAX]; /* the name of the file to exec */
char *new_sp, *basename;
vir_bytes src, dst, text_bytes, data_bytes, bss_bytes, stk_bytes, vsp;
phys_bytes tot_bytes;
/* total space for program, including gap */
long sym_bytes;
vir_clicks sc;
struct stat s_buf;
vir_bytes pc;
/* Do some validity checks. */
rmp = mp;
stk_bytes = (vir_bytes) stack_bytes;
if (stk_bytes > ARG_MAX) return(ENOMEM);
/* stack too big */
if (exec_len <= 0 || exec_len > PATH_MAX) return(EINVAL);
/* Get the exec file name and see if the file is executable. */
src = (vir_bytes) exec_name;
dst = (vir_bytes) name_buf;
r = sys_copy(who, D, (phys_bytes) src,
MM_PROC_NR, D, (phys_bytes) dst, (phys_bytes) exec_len);
if (r != OK) return(r);
/* file name not in user data segment */
tell_fs(CHDIR, who, FALSE, 0);
/* switch to the user's FS environ. */
fd = allowed(name_buf, &s_buf, X_BIT);
/* is file executable? */
if (fd < 0) return(fd);
/* file was not executable */
/* Read the file header and extract the segment sizes. */
sc = (stk_bytes + CLICK_SIZE - 1) >> CLICK_SHIFT;
m = read_header(fd, &ft, &text_bytes, &data_bytes, &bss_bytes,
&tot_bytes, &sym_bytes, sc, &pc);
if (m < 0) {
close(fd);
/* something wrong with header */
return(ENOEXEC);
}
/* Fetch the stack from the user before destroying the old core image. */
src = (vir_bytes) stack_ptr;
dst = (vir_bytes) mbuf;
r = sys_copy(who, D, (phys_bytes) src,
MM_PROC_NR, D, (phys_bytes) dst, (phys_bytes)stk_bytes);
if (r != OK) {
close(fd);
/* can't fetch stack (e.g. bad virtual addr) */
return(EACCES);
}
/* Can the process' text be shared with that of one already running? */
sh_mp = find_share(rmp, s_buf.st_ino, s_buf.st_dev, s_buf.st_ctime);
/* Allocate new memory and release old memory. Fix map and tell kernel. */
r = new_mem(sh_mp, text_bytes, data_bytes, bss_bytes, stk_bytes, tot_bytes);
if (r != OK) {
close(fd);
/* insufficient core or program too big */
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
097
098
099
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
return(r);
}
/* Save file identification to allow it to be shared. */
rmp->mp_ino = s_buf.st_ino;
rmp->mp_dev = s_buf.st_dev;
rmp->mp_ctime = s_buf.st_ctime;
/* Patch up stack and copy it from MM to new core image. */
vsp = (vir_bytes) rmp->mp_seg[S].mem_vir << CLICK_SHIFT;
vsp += (vir_bytes) rmp->mp_seg[S].mem_len << CLICK_SHIFT;
vsp -= stk_bytes;
patch_ptr(mbuf, vsp);
src = (vir_bytes) mbuf;
r = sys_copy(MM_PROC_NR, D, (phys_bytes) src,
who, D, (phys_bytes) vsp, (phys_bytes)stk_bytes);
if (r != OK) panic("do_exec stack copy err", NO_NUM);
/* Read in text and data segments. */
if (sh_mp != NULL) {
lseek(fd, (off_t) text_bytes, SEEK_CUR);
} else {
load_seg(fd, T, text_bytes);
}
load_seg(fd, D, data_bytes);
/* shared: skip text */
#if (SHADOWING == 1)
if (lseek(fd, (off_t)sym_bytes, SEEK_CUR) == (off_t) -1) ;
if (relocate(fd, (unsigned char *)mbuf) < 0) ;
pc += (vir_bytes) rp->mp_seg[T].mem_vir << CLICK_SHIFT;
#endif
close(fd);
/* error */
/* error */
/* don't need exec file any more */
/* Take care of setuid/setgid bits. */
if ((rmp->mp_flags & TRACED) == 0) { /* suppress if tracing */
if (s_buf.st_mode & I_SET_UID_BIT) {
rmp->mp_effuid = s_buf.st_uid;
tell_fs(SETUID,who, (int)rmp->mp_realuid, (int)rmp->mp_effuid);
}
if (s_buf.st_mode & I_SET_GID_BIT) {
rmp->mp_effgid = s_buf.st_gid;
tell_fs(SETGID,who, (int)rmp->mp_realgid, (int)rmp->mp_effgid);
}
}
* Save offset to initial argc (for ps) */
rmp->mp_procargs = vsp;
/* Fix 'mproc' fields, tell kernel that exec is done, reset caught sigs. */
for (sn = 1; sn <= _NSIG; sn++) {
if (sigismember(&rmp->mp_catch, sn)) {
sigdelset(&rmp->mp_catch, sn);
rmp->mp_sigact[sn].sa_handler = SIG_DFL;
sigemptyset(&rmp->mp_sigact[sn].sa_mask);
}
}
rmp->mp_flags &= ~SEPARATE;
rmp->mp_flags |= ft;
new_sp = (char *) vsp;
/* turn off SEPARATE bit */
/* turn it on for separate I & D files */
tell_fs(EXEC, who, 0, 0);
/* allow FS to handle FD_CLOEXEC files */
/* System will save command line for debugging, ps(1) output, etc. */
basename = strrchr(name_buf, '/');
if (basename == NULL) basename = name_buf; else basename++;
sys_exec(who, new_sp, rmp->mp_flags & TRACED, basename, pc);
return(OK);
}
------
Bibliography:
-------------
-Andrew S. Tanenbaum and Albert S. Woodhull, "Operating System, Design and Implementation.
Second Edition", Prentice Hall, Upper Saddle River, New Jersy 07458, 1997, p.939.
-David A Rusling, "The Linux Kernel", http://www.linuxdoc.org/LDP/tlk/tlk-title.html, 1999.
-George F. Corliss,"Minix_book",http://www.mscs.mu.edu/~georgec/Classes/207.1998/14Minix_book/,
1998.
Minix source snippets are: Copyright (c) 1987,1997, Prentice Hall. All rights reserved.
(c) Copyright 2001 Frédérick Giasson, All Rights Reserved