Chapter 8 ■ Our Object All Sublime directly. The x86 CPUs include a machine instruction that has special powers to make use of the interrupt vector table. The INT (INTerrupt) instruction is used by eatsyscall.asm to request the services of Linux in displaying its ad slogan string on the screen. At two places, eatsyscall.asm has an INT 80h instruction. When an INT 80h instruction is executed, the CPU goes down to the interrupt vector table, fetches the address from slot 80h, and then jumps execution to that address. The transition from user space to kernel space is clean and completely controlled. On the other side of the address stored in table slot 80h, the dispatcher picks up execution and performs the service that your program requests. The process is shown in Figure 8-5. When Linux loads at boot time, one of the many things it does to prepare the machine for use is put correct addresses in several of the vectors in the interrupt vector table. One of these addresses is the address of the kernel services dispatcher, which goes into slot 80h. Later, when you type the name of your program eatsyscall on the Linux console command line, Linux loads the eatsyscall executable into user space memory and allows it to execute. To gain access to kernel services, eatsyscall executes INT 80h instructions as needed. Nothing in your program needs to know anything more about the Linux kernel services dispatcher than its number in the interrupt vector table. Given that single number, eatsyscall is content to remain ignorant and simply let the INT 80h instruction and interrupt vector 80h take it where it needs to go. On the northwest side of Chicago, where I grew up, there was a bus that ran along Milwaukee Avenue. All Chicago bus routes have numbers, and the Milwaukee Avenue route is number 56. It started somewhere in the tangled streets just north of downtown, and endedup in a forest preservejust inside the city limits. The Forest Preserve District ran a swimming pool called Whelan Pool in that forest preserve. Kids all along Milwaukee Avenue could not necessarily have told you the address of Whelan Pool, but they could tell you in a second how to get there: Just hop on bus number 56 and take it to the end of the line. It's like that with software interrupts. Find the number of the vector that reliably points to your destination and ride that vector to the end of the line, without worrying about the winding route or the precise address of your destination. Behind the scenes, the INT 80h instruction does something else: it pushes the address of the next instruction (that is, the instruction immediately following the INT 80h instruction) onto the stack, before it follows vector 80h into the Linux kernel. Like Hansel and Gretel, the INT 80h instruction was pushing some breadcrumbs to the stack as a way of helping the CPU find its way back to the eatsyscall program after the excursion down into Linux—but more on that later. Now, the Linux kernel services dispatcher controls access to 200 individual service routines. How does it know which one to execute? You have to tell 257 258 Chapter 8 ■ Our Object All Sublime the dispatcher which service you need, which you do by placing the service's number in register EAX. The dispatcher may require other information as well, and will expect you to provide that information in the correct place—almost always in various registers—before it begins its job. The INT80h instruction first pushes the address of the instruction after it onto the stack... The Stack Return Address Your Code ...and then jumps to whatever address is stored in vector 80h. INT 80h (Next Instruction) User Space Kernel Space Linux The address at vector 80h takes execution into the Linux system call dispatcher Dispatcher Vector Table Vector 80h Figure 8-5: Riding an interrupt vector into Linux Look at the following lines of code from eatsyscall.asm: mov eax,4 mov ebx,1 ; Specify sys_write syscall ; Specify File Descriptor 1: Standard Output Chapter 8 mov ecx,EatMsg mov edx,EatLen int 80H ■ Our Object All Sublime ; Pass offset of the message ; Pass the length of the message ; Make syscall to output the text to stdout This sequence of instructions requests that Linux display a text string on the console. The first line sets up a vital piece of information: the number of the service that we're requesting. In this case, it's to sys_write, service number 4, which writes data to a Linux file. Remember that in Linux, just about everything is a file, and that includes the console. The second line tells Linux which file to write to: standard output. Every file must have a numeric file descriptor, and the first three (0, 1, and 2) are standard and never change. The file descriptor for standard output is 1. The third line places the address of the string to be displayed in ECX. That's how Linux knows what it is that you want to display. The dispatcher expects the address to be in ECX, but the address is simply where the string begins. Linux also needs to know the string's length, and we place that value in register EDX. With the kernel service number, the address of the string, and the string's length tucked into their appropriate registers, we take a trip to the dispatcher by executing INT 80h. The INT instruction is all it takes. Boom!—execution crosses the bridge into kernel space, where Linux the troll reads the string at ECX and sends it to the console through mechanisms it keeps more or less to itself. Most of the time, that's a good thing: there can be too much information in descriptions of programming machinery, just as in descriptions of your personal life. Getting Home Again So much for getting into Linux. How does execution get back home again? The address in vector 80h took execution into the kernel services dispatcher, but how does Linux know where to go to pass execution back into eatsyscall? Half of the cleverness of software interrupts is knowing how to get there, and the other half—just as clever—is knowing how to get back. To continue execution where it left off prior to the INT 80h instruction, Linux has to look in a completely reliable place for the return address, and that completely reliable place is none other than the top of the stack. I mentioned earlier (without much emphasis) that the INT 80h instruction pushes an address to the top of the stack before it launches off into the unknown. This address is the address of the next instruction in line for execution: the instruction immediately following the INT 80h instruction. This location is completely reliable because, just as there is only one interrupt vector table in the machine, there is only one stack in operation at any one time. This means that there is only one top of the stack—that is, at the address pointed 259