Deferred segment-loading An exercise on implementing the concept of ‘load-on-demand’

advertisement
Deferred segment-loading
An exercise on implementing the
concept of ‘load-on-demand’
The ‘do-it-later’ philosophy
• Modern operating systems often follow a
policy of deferring work whenever possible
• The advantage of adopting this practice is
most evident in those cases where it turns
out that the work was not needed after all
• Example: Many programs contain lots of
code and data for diagnosing errors – but
it’s not needed if no errors actually occur
Avoiding wasted effort
• Thus it will be more efficient if an OS does
not always take time to load those portions
of a program (such as its error-diagnostics
and error-recovery routines) which may be
unnecessary in the majority of situations
• But of course the OS needs to be ready to
take a ‘timeout’ for loading those routines
when and if the need becomes apparent
Another example
• In a multitasking environment, many tasks
are taking turns at executing instructions
• The CPU typically performs task-switching
several times every second – and must do
a ‘save’ of the outgoing task’s context, and
a ‘load’ of the incoming task’s context, any
time it switches from one task to the next
• We ask: can any of this work be deferred?
The NPX registers
• Only a few tasks typically make any use of
the Pentium’s ‘floating-point’ registers, so
it’s wasteful to do a ‘save-and-reload’ for
these registers with every task-switch
• The TS-bit (bit #3 in Control Register 0) is
designed to assist an OS in implementing
a policy of ‘lazy’ context-switching for the
set of registers used in floating-point work
Example: effect of TS=1
• Each time the CPU performs a task-switch
it automatically sets the TS-bit to 1 (only
an OS can execute a ‘clts’ to reset TS=0)
• When any task tries to execute any of the
NPX instructions (to do some arithmetic
with values in the floating-point registers),
an exception 7 fault will occur if the TS-bit
hasn’t been cleared since a task-switch
The fault-7 exception-handler
• The work involved in saving the contents
of the floating-point registers being used
by a no-longer-active task, and reloading
those registers with values that the active
task expects to work on, can be deferred
to the fault-handler for exception-7
• Then it can clear the TS-bit (with ‘clts’) and
‘retry’ the instruction that caused this ‘fault’
The ‘fork()’ system-call
• In a UNIX/Linux operating system, the way
any new task get created is by a call to the
kernel’s ‘fork()’ service-function
• This function is supposed to ‘duplicate’ the
entire program-environment of the calling
task (i.e., code, data, stack and heap, plus
the kernel’s process-control data-structure
• But much of this work is often wasted!
The ‘fork-and-exec’ senario
• In practice, the most common reason for a
program to ‘fork()’ a child-process is so the
child-task can launch a separate program:
if ( fork() == 0 ) execl( “newprog”, newargs, 0 );
• In these cases the ‘duplicated’ code, data,
and heap are not relevant to the new task
-- and so they will simply get discarded!
‘loading-on-demand’
• An OS can avoid all the wasted effort of
duplicating a parent-task’s resources (its
code, data, heap, etc.) by implementing
“only upon demand” loading as a policy
• For an OS that uses the CPU’s memorysegmentation capabilities, an ‘on demand’
policy can be implemented by using the
Pentium ‘Segment-Not-Present’ exception
How it works
• Segments remain ‘uninitialized’ until they
are actually accessed by an application
• Segment-descriptors are initially marked
as ‘Not Present’ (i.e., their P-bit is zero)
• When any instruction attempts to access
such a memory-segment (read, write, or
fetch), the CPU responds by generating
exception-11: “Segment-Not-Present”
An ‘error-code’ is pushed
• Besides pushing the memory-address of
the faulting instruction onto the exceptionhandler’s stack, the CPU also pushes an
‘error-code’ to indicate which descriptor
was not yet marked as being ‘Present’
• The handler can then ‘load’ that segment
with the proper information and adjust its
descriptor’s P-bit, then retry the instruction
Error-Code Format
31
15
reserved
3
table-index
2
1
0
T
I
I E
D X
T T
Legend:
EXT = An external event caused the exception (1=yes, 0=no)
IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no)
TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT)
This same error-code format is used with exceptions 0x0B, 0x0C, and 0x0D
Our ‘simulation’ demo
• We can illustrate the ‘just-in-time’ idea by
writing a program that performs a ‘far’ call
to an ‘uninitialized’ region of memory:
lcall
$sel_CS, $draw_message
• The code-segment descriptor (referenced
here by the selector-value ‘sel_CS’) will be
initially marked ‘Not-Present’ (so this ‘lcall’
instruction will trigger an exception-11)
Our ‘fault-handler’
• Our Interrupt-Service-Routine for fault-11
will do two things:
• Initialize the memory-region with code and data
• Mark the code-segment’s descriptor as ‘Present’
• It will carefully preserve the CPU registers,
so that it can ‘retry’ the faulting instruction
Where is the ‘error-code’?
16-bits
SS:SP
FLAGS
+6
CS
+4
IP
+2
error-code
+0
Layout of our fault-handler’s stack
(because we used a 286 interrupt-gate)
The Pentium provides a special pair of instructions that procedures
can use to address any parameter-values that reside on its stack:
‘enter’ and ‘leave’
Code using ‘enter’ and ‘leave’
isrNPF: # Our fault-handler for exception-0x0B
enter
$0, $0
call
call
initialize_the_high_arena
mark_segment_as_ready
leave
add
iret
$2, %sp
# setup stackframe access
# discard the frame access
# discard the error-code
# ‘retry’ the faulting instruction
What does ‘enter’ do?
• The effect of the single instruction
enter
$0, $0
is equivalent to this instruction-sequence:
push
mov
%bp
%sp, %bp
How the stack is changed
16-bits
SS:SP
16-bits
FLAGS
+6
FLAGS
+8
CS
+4
CS
+6
IP
+2
IP
+4
error-code
error-code
+0
Layout of our fault-handler’s stack
BEFORE executing ‘enter’
SS:SP
old-BP
+2
SS:BP
Layout of our fault-handler’s stack
AFTER executing ‘enter’
NOTE: Any memory-references that use indirect addressing via register BP
will use the SS segment-register by default (not the DS segment-register)
for example:
testw
$0x0007, 2(%bp)
What does ‘leave’ do?
• The effect of the single instruction
leave
is equivalent to this instruction-sequence:
mov
pop
%bp, %sp
%bp
How the stack is changed
16-bits
16-bits
FLAGS
+8
CS
+6
IP
error-code
old-BP
…
+4
+2
SS:BP
SS:SP
FLAGS
+6
CS
+4
IP
+2
error-code
+0
Layout of our fault-handler’s stack
AFTER executing ‘leave’
other pushed
words
SS:SP
Layout of our fault-handler’s stack
BEFORE executing ‘leave’
So the effect of ‘leave’ is to
undo the effect of ‘enter’
Our demo’s memory-layout
ARENA #3
(not used by this demo)
0x00030000
Copy contents
of ARENA #1
to ARENA #2
ARENA #2
(where our demo expects
drawing code will reside)
0x00020000
ARENA #1
(where the loader puts our
program code and data)
0x00010000
BOOT_LOCN
0x00007C00
0x00000000
Efficient copying
• We use the Pentium’s ‘rep movsw’ instruction to
perform memory-to-memory copying operations
• The segment-selector for the segment we copy
from (it must be ‘readable’) goes into registers
DS, and the segment-selector for the segment
we copy to (it must be ‘writable’) goes into ES
• The number of words we will copy should match
the size of our code-segment (which is 64KB)
• The Direction-Flag should be cleared (DF=0)
Example assembly code
; use ‘forward’ string-copying
cld
mov
mov
xor
$sel_ds, %si
%si, %ds
%si, %si
; selector for arena at 0x10000
; goes in segment-register DS
; start copying from offset zero
mov
mov
xor
$sel_DS, %di
%di, %es
%di, %di
; selector for arena at 0x20000
; goes in segment-register DS
; start copying to offset zero
mov
rep
$0x8000, %cx
movsw
; number of words to be copied
; perform the arena-copying
Segment-Descriptor Format
47
63
Base[31..24]
32
RA
D
CR
Limit
GDSV
P P SX / / A
[19..16]
VL
L
DW
Base[15..0]
Base[23..16]
Limit[15..0]
0
31
The segment-descriptor’s ‘Present’ bit is bit-number 47
In-class exercise
• To get some practical ‘hands on’ experience with
implementing the demand-loading concept we
suggest the following exercise:
Modify our ‘notready.s’ demo so that it uses a 32-bit
Interrupt-Gate for its Segment-Not-Present entry
in the Interrupt Descriptor Table (this will affect the
layout of the fault-handler’s stack)
• You may need to abandon use of the ‘enter’ and
‘leave’ instructions unless you also use a 32-bit
data-segment descriptor for your stack-segment
Download