Efficient Software-Based Fault Isolation Authors: Robert Wahbe Steven Lucco Thomas E. Anderson Susan L. Graham Presenter: Gregory Netland The Big Picture • Two Main things: – Loading the distrusted code into its own fault domain • Only mistrusted • Cheaper RPC for cross fault domains – Modify object code so it doesn’t jump to an area it is not supposed to Introduction – More Big Picture Definitions and such • 1993 - what was happening…133mhz, WTC, microkernels • Sandboxing – only slightly increases execution time • Modifying Object Code • Fault Domains – a logically separate portion of the applications address space, and has a uniques identifier which is used to control its access Examples of Programs that could cause problems • PostGres – queries with extension code can play with data not its own or just mess with database in general • BSD (three areas) • Microsoft’s Object Linking • As more things are moved to the user level, more third party code can mess with kernel operations - ? Other Examples • Unix vnode interface – easy to add file system • I/O / Active Messages – compiled into kernel for reasonable performance • Quark Xpress – extension modules can currupt its data structures What does this show significant portion of time being spent in operating system context switch code only a small amount of code is distrusted Why us software and not Hardware • Does not scale with processor integer performance • High Cost for address switching – RPC example: requires at least • • • • • • • A trap into the operating system kernel Copying each argument from the caller to callee Saving and restoring registers Switching hardware address spaces Possibly flushing the TLB A trap back to user level Rinse and repeat Software Enforced Fault Isolation • Going back to the big picture, we have to locate were faults occur in a software module and then we can look at Sandboxing • Cut up the virtual address space into separate chunks Segment Identifier • Divide an applications virtual address space into segments • All virtual addresses share pattern of upper bits • Fault domain has two segments – One for distrusted module’s code – Other for heap, stack, and static data Software Encapsulation • Distrusted code can only jump within its segment, as well as write to its segment, due to the segment identifiers • All legal jumps have same bit pattern • This doesn’t solve all problems, the os will still need to catch illegal things, such as unmapped pages • We have two techniques for this……. Segment matching • Insert Checking code before any unsafe instruction (unsafe means not statically verifiable…jumps through registers procedure returns, or stores in registers for target address are considered unsafe) • Is it in the correct segment? • If not, trap to system error • Uses 4 registers, but not a problem based on their tests • can pinpoint the offending instruction, there for better for development • Or can skip the pinpointing for efficiency Or Sandboxing • This sets the upper bits to the correct segment identifier • This doesn’t catch, it stops • Verifiable • Takes 5 registers instead of 4 Optimizations • Register + offset and guardzones –this avoids uneeded math to compute target addresses. They sandbox reg and not reg + offset to save an instruction. • MIPS stack pointer as a dedicated register and the stack pointer is only sandboxed when set • Transformation tool to remove sandboxing from loops Process Resources • Need to stop multiple fault domains that share the same virtual address space from corrupting per-address-space resources – Let the operating system know – Cross fault domain RPC Data Sharing • Cant work the same way hardware implementation does because hw solution manipulates page table entries in a different way. • Read only is not a problem because fault domains can read any memory within the address space • Read-write through lazy pointer swizzling – Modify hardware page tables to map the shared memory region into every address space segment that needs access – Automatically translates into own segment through sandboxing • Another option is shared segment matching, dedicated registers to hold bitmap that tells which segments the fault domain can access Implementation and Verification • Unsafe regions are areas of code that modify jump dedicated register • Is the dedicated register valid upon exiting the region • Disadvantages – Most modified compilers only support one language – Compiler and verifier must be synchronized – Binary patching can fix these things, however not robust enough Binary patching • System can encapsulate the module by directly modifying object code • Unfornately, a good technique for this does not exist yet • Jump table is legal address outside the fault domain • Kept in read only, so only modified by trusted code • Stubs are unprotected and responsible for copying cross domain arguments Fast communication between fault domains • Calling a trusted stub outside of your domain – Jump table – only modified by trusted code, and legal entry point outside domain • Arguments are passed between fault domains – Because they are trusted, we can copy directly to target domain*** • Fault isolation – In a cross domain call the registers used by the caller and possibly modified by the callee are protected – Switch execution stack – Validate registers – Establish register context for encapsulation • Errors – Addressing violation, loops, etc… Results • How much overhead? • How fast is cross domain fault? Related Work • At this time it was typical to buld micro-kernel operating systems with separate address spaces – This meant heavy ipc with untrusted domains – Performance problems • Some people loaded modules into kernel address space (co-location) – Which means performance over protection • So, this paper is trying to get performance and protection.