Programming.with.Linux,.3rd.Edition.Oct.2009

advertisement
Chapter 8
■
Our Object All Sublime
directly. The x86 CPUs include a machine instruction that has special powers
to make use of the interrupt vector table. The INT (INTerrupt) instruction is
used by eatsyscall.asm to request the services of Linux in displaying its ad
slogan string on the screen. At two places, eatsyscall.asm has an INT 80h
instruction. When an INT 80h instruction is executed, the CPU goes down to
the interrupt vector table, fetches the address from slot 80h, and then jumps
execution to that address. The transition from user space to kernel space is
clean and completely controlled. On the other side of the address stored in
table slot 80h, the dispatcher picks up execution and performs the service that
your program requests.
The process is shown in Figure 8-5. When Linux loads at boot time, one of
the many things it does to prepare the machine for use is put correct addresses
in several of the vectors in the interrupt vector table. One of these addresses is
the address of the kernel services dispatcher, which goes into slot 80h.
Later, when you type the name of your program eatsyscall on the Linux
console command line, Linux loads the eatsyscall executable into user space
memory and allows it to execute. To gain access to kernel services, eatsyscall
executes INT 80h instructions as needed. Nothing in your program needs
to know anything more about the Linux kernel services dispatcher than its
number in the interrupt vector table. Given that single number, eatsyscall is
content to remain ignorant and simply let the INT 80h instruction and interrupt
vector 80h take it where it needs to go.
On the northwest side of Chicago, where I grew up, there was a bus that
ran along Milwaukee Avenue. All Chicago bus routes have numbers, and the
Milwaukee Avenue route is number 56. It started somewhere in the tangled
streets just north of downtown, and endedup in a forest preservejust inside the
city limits. The Forest Preserve District ran a swimming pool called Whelan
Pool in that forest preserve. Kids all along Milwaukee Avenue could not
necessarily have told you the address of Whelan Pool, but they could tell you
in a second how to get there: Just hop on bus number 56 and take it to the
end of the line. It's like that with software interrupts. Find the number of the
vector that reliably points to your destination and ride that vector to the end
of the line, without worrying about the winding route or the precise address
of your destination.
Behind the scenes, the INT 80h instruction does something else: it pushes the
address of the next instruction (that is, the instruction immediately following
the INT 80h instruction) onto the stack, before it follows vector 80h into the
Linux kernel. Like Hansel and Gretel, the INT 80h instruction was pushing
some breadcrumbs to the stack as a way of helping the CPU find its way back
to the eatsyscall program after the excursion down into Linux—but more on
that later.
Now, the Linux kernel services dispatcher controls access to 200 individual
service routines. How does it know which one to execute? You have to tell
257
258
Chapter 8
■
Our Object All Sublime
the dispatcher which service you need, which you do by placing the service's
number in register EAX. The dispatcher may require other information as well,
and will expect you to provide that information in the correct place—almost
always in various registers—before it begins its job.
The INT80h
instruction first pushes
the address of the
instruction after it onto
the stack...
The Stack
Return Address
Your Code
...and then jumps to
whatever address is
stored in vector 80h.
INT 80h
(Next Instruction)
User Space
Kernel Space
Linux
The address at vector
80h takes execution
into the Linux system
call dispatcher
Dispatcher
Vector Table
Vector 80h
Figure 8-5: Riding an interrupt vector into Linux
Look at the following lines of code from eatsyscall.asm:
mov eax,4
mov ebx,1
; Specify sys_write syscall
; Specify File Descriptor 1: Standard Output
Chapter 8
mov ecx,EatMsg
mov edx,EatLen
int 80H
■
Our Object All Sublime
; Pass offset of the message
; Pass the length of the message
; Make syscall to output the text to stdout
This sequence of instructions requests that Linux display a text string on
the console. The first line sets up a vital piece of information: the number
of the service that we're requesting. In this case, it's to sys_write, service
number 4, which writes data to a Linux file. Remember that in Linux, just
about everything is a file, and that includes the console. The second line tells
Linux which file to write to: standard output. Every file must have a numeric
file descriptor, and the first three (0, 1, and 2) are standard and never change.
The file descriptor for standard output is 1.
The third line places the address of the string to be displayed in ECX. That's
how Linux knows what it is that you want to display. The dispatcher expects
the address to be in ECX, but the address is simply where the string begins.
Linux also needs to know the string's length, and we place that value in
register EDX.
With the kernel service number, the address of the string, and the string's
length tucked into their appropriate registers, we take a trip to the dispatcher
by executing INT 80h. The INT instruction is all it takes. Boom!—execution
crosses the bridge into kernel space, where Linux the troll reads the string at
ECX and sends it to the console through mechanisms it keeps more or less to
itself. Most of the time, that's a good thing: there can be too much information
in descriptions of programming machinery, just as in descriptions of your
personal life.
Getting Home Again
So much for getting into Linux. How does execution get back home again?
The address in vector 80h took execution into the kernel services dispatcher,
but how does Linux know where to go to pass execution back into eatsyscall?
Half of the cleverness of software interrupts is knowing how to get there, and
the other half—just as clever—is knowing how to get back.
To continue execution where it left off prior to the INT 80h instruction,
Linux has to look in a completely reliable place for the return address, and that
completely reliable place is none other than the top of the stack.
I mentioned earlier (without much emphasis) that the INT 80h instruction
pushes an address to the top of the stack before it launches off into the
unknown. This address is the address of the next instruction in line for
execution: the instruction immediately following the INT 80h instruction. This
location is completely reliable because, just as there is only one interrupt vector
table in the machine, there is only one stack in operation at any one time. This
means that there is only one top of the stack—that is, at the address pointed
259
Download