Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’ for the program-segments in an ELF executable file Background • Recall our previous in-class exercise: we wrote a demo-program that could execute a Linux application (named ‘hello’) • A working version of that demo is now on our class website (named ‘tryexec.s’) • That demo simulated ‘loading’ of the .text and .data program-segments, by copying the ‘hello’ file’s memory-image into two distinct locations in extended memory Memory-to-memory copying • We used the Pentium’s ‘movsb’ instruction to perform those two copying operations • The number of bytes we copied was equal to the size of five disk-sectors (5 * 512) • To ‘load’ the ‘.text’ program-segment, we copied from 0x00011800 to 0x08048000 • To ‘load’ the ‘.data’ program-segment, we copied from 0x00011800 to 0x08049000 Copying to extended memory • The ‘movsb’ instruction is an example of a ‘complex’ instruction – it requires setup of several CPU registers prior to its execution • Setup required for ‘movsb’ involves: – – – – Setup DS : ESI to address the source buffer Setup ES : EDI to address the dest’n buffer Setup ECX with the number of bytes to copy Clear the DF-bit in the EFLAGS register • Then ‘rep movsb’ perform the string-copying • Note that 32-bit addressing is required here! Example assembly code ; Source-statements to ‘load’ the ‘.text’ program-segment: USE32 ; assemble for 32-bit code-seg mov ax, #sel_fs ; selector for 4GB data-segment mov ds, ax ; with base-address=0x000000 mov es, ax ; is used for both DS and ES mov esi, #0x00011800 ; offset-address for ‘source’ mov edi, #0x08048000 ; offset-address for ‘dest’n’ mov ecx, #2560 ; number of bytes to be copied cld ; use ‘forward’ string-copying rep ; ‘repeat-prefix’ is inserted movsb ; before the ‘movsb’ opcode Segments were ‘preloaded’ • In our ‘tryexec.s’ demo, ‘.text’ and ‘.data’ segments were initialized in advance of transferring control to the ‘hello’ program • That technique is called ‘preloading’ • But the Pentium supports an alternative approach to program-loading (it’s called ‘load-on-demand’) • Segments remain ‘uninitialized’ until they are actually accessed by the application Segment-Not-Present • The ‘Segment-Not-Present’ exception can be utilized to implement ‘demand-loading’ • Segment-descriptors are initially marked as ‘Not Present’ (i.e., the P-bit is zero) • When any instruction attempts to access these memory-segments (by moving the segment-selector into a segment-register), the CPU will generate an interrupt (int-11) The Fault-Handler • The interrupt service routine for INT-0x0B (Segment-Not-Present Fault) can perform the initialization of the specified memory region (i.e., the ‘loading’ operation), mark the segment-descriptor as ‘Present’ and then ‘retry’ the instrtuction that triggered the fault (by executing an ‘iret’ or ‘iretd’) Error-Code Format 31 15 reserved 3 table-index 2 1 0 T I I E D X T T Legend: EXT = An external event caused the exception (1=yes, 0=no) IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no) TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT) This same error-code format is used with exceptions 0x0B, 0x0C, and 0x0D Benefits of deferred loading? • With a small-size program (like ‘hello’) we might not see much benefit from using the ‘load-on-demand’ mechanism, since both of the program-segments sooner-or-later would have to be ‘loaded’ into memory • The only apparent benefit is that copying can be done by ONE program-fragment (i.e., within the fault-handler) instead of by two fragments in the ‘pre-load’ procedure Table-driven ‘handler’ • Balanced against the fewer instructions required with ‘load-on-demand’ is the need to provide a table-driven interrupt-handler that can ‘load’ whichever ‘not present’ program-segments happen to get accessed • A very simple implementation for such a handler could use a table like this one: memmap: ; from to count type .LONG 0x11800, 0x08048000, 2560, 0xFA .LONG 0x11800, 0x08049000, 2560, 0xF2 Big/Complex programs • With complex applications that use many more program-segments, ‘demand-loading’ could potentially offer some runtime efficiencies • For example, with interactive programs that can display various error-messages: If error-handling routines are in separate program-segments, then those segments would not need to be loaded unless -- and until -- the error-condition actually occurs (maybe never) In-class exercise • To get practical ‘hands on’ experience with implementing the demand-loading concept we propose the following exercise • Modify the ‘tryexec.s’ demo (see website) by deferring the memory-to-memory copy operations until the program-segments are actually referenced by the ‘hello’ program • Then perform the copying within an ISR Some exercise details • Copy the ‘tryexec.s’ demo-program to a new file, named ‘ondemand.s’ • In the ‘load_and_exec_demo’ procedure, comment out the two memory-to-memory copy operations, and the mark the LDT segment-descriptors for .text and .data as ‘NOT PRESENT’ segments (i.e., P=0) • Create a ‘memmap’ table that describes the copying operations that will be needed Create a fault-handler • Add an interrupt-gate for exception 0x0B and a fault-handler that will perform the copy-operation for a ‘not-present’ segment • Remember that the CPU will automatically push an error-code onto the ring0 stack if a ‘segmentnot-present exception occurs • Don’t forget to discard that error-code as the final step before exiting from the ISR: add esp, #4 ; discard error-code iretd ; retry the instruction Parallel table-entries memmap theLDT 0x00 0x00 From 0x11800 To Size Type 0x8048000 2560 0xFA 0x08 From 0x10 0x11800 To Size Type 0x8049000 2560 0xF2 0x00CF7A000000FFFF 0x10 0x00CF72000000FFFF 0x00CF72000000FFFF 4-words 0x20 From 0 To 0 4-longwords Size 0 Type 0xF2