The performance Of Micro-Kernel Based Systems H. Hartig, M

advertisement
Outline of the Paper
• Introduction.
• Overview Of L4.
• Design and Implementation Of Linux Server.
• Evaluating Compatibility Performance.
• Evaluating Extensibility Performance.
• Alternative concepts from a performance point of
view.
• Conclusion
Introduction
 Motivation: Microkernel based systems found too slow
 Goal: Show that microkernel based systems can be
practical with good performance
 Method:
• Conduct experiments on L4, a lean second generation
microkernel with Linux running on top of it
• The resulting system called L4linux
• Compare performance of L4linux to native Linux and
• Mklinux- Linux running on mach derived first
generation microkernel
L4 Essentials
• Based on two concepts - Address spaces and threads.
• Address spaces- Constructed recursively by user level servers called pagers
•
•
•
•
outside the kernel.
 Initial address space-Physical memory
 The next address spaces created by granting, mapping and unmapping
flexpages.
 Flexpages- Logical pages of sizes 2^n ranging from 1 physical page to
entire address space.
 Pagers act as main memory managers enabling the implementation of
memory management policies
Threads• An activity executing inside the address space
• can dynamically associate with individual pagers.
IPC refers to cross address space communication
I/O ports treated as part of address space
Hardware interrupts handled as IPC
Linux Design and Implementation
• L4 implemented on Pentium, Alpha and MIPS
architectures
• Linux has architecture dependent and independent
parts.
• All modifications done to architecture dependent part.
• Application binary interface in Linux unmodified.
• No Linux-specific modifications done to L4.
The Linux Kernel
• On booting, the Linux server requests memory from its
•
•
•
•
pager which maps physical memory in to the server’s
address space.
The Linux server then acts as the pager for the user
processes it creates.
Hardware page tables are kept inside L4 and cannot be
accessed directly by user processes leading to additional
logical page tables kept by Linux kernel.
A single L4 thread is multiplexed by L4linux to handle
system calls and page faults.
Interrupts disabled for synchronization and critical
sections.
Interrupt Handling and Device
Drivers
• Interrupt handlers in native Linux are subdivided in to top
halves(Run immediately) and bottom halves(Run later).
• L4 maps hardware interrupts in to messages.
• Top half interrupt handlers are implemented as threads waiting
for such messages, one thread per interrupt source.
• Another thread handles all bottom halves when top half is
completed.
Linux User Processes
• Linux user processes implemented as a task.
• The task is created by the Linux server and associates
it with a pager.
• L4 converts any Linux user page fault in to an RPC
and sends it to Linux kernel.
• The kernel then replies by mapping/unmapping the
pages from its address space of the process.
System Call Mechanisms
• L4linux system call implemented as RPCs between user
processes and Linux server.
• There are three system call interfaces:
1. A modified version of libc.so which uses L4 IPC
primitives to call Linux kernel
2. A corresponding libc.a
3. A user level exception handler which does the system call
trap instruction by calling the corresponding routine in the
modified shared library.
• TLB flushes avoided
• L4linux uses physical copyin and copyout to exchange data
between kernel and user processes instead of address
translation by hardware
Signaling
• Linux kernel signals the user processes by
manipulating their stack, SP and PC.
• In L4, each user process has a signal handler thread.
• Upon receiving signal from the Linux server, the
signal handler causes the user process’s main thread to
save its state and enter Linux and resumes the main
thread.
Scheduling
• All threads are scheduled by L4’s internal scheduler.
• The native Linux server’s schedule() operation is used
only for multiplexing Linux server thread across cross
routines when concurrent calls are made.
• The number of co routine switches are minimized by
sleeping until a new system call or wakeup call is
received.
Supporting Tagged TLBs or
Small spaces
• Tagged TLB used to avoid TLB flushes in native Linux
• However TLB conflicts have the same effect as TLB
flushes due to extensive use of libraries and identical,
virtual allocation of code and data in address spaces.
• In L4linux, a special library permits the customization of
code and data
• The emulation library and signal thread can also be
mapped closely to the application.
• Thus, servers executing in small address spaces can be
built.
Compatibility Performance
 Three questions:
 What is the penalty of using L4linux instead of
native Linux? - Explained by running benchmarks
on native and L4linux using the same hardware.
 Does the performance of the underlying
microkernel matter?- Explained by comparing
L4linux to Mklinux.
 How much does co-location improve
performance?- Explained by comparing user mode
L4linux to in-kernel version of Mklinux.
Micro Benchmarks
• Used to analyze the detailed behavior of L4linux
mechanisms
• getpid – the shortest system call was repeated in a
loop.
Micro Benchmarks
• The Imbench benchmark suite measures system calls,
context switches, memory accesses, pipe operations,
networking operations etc.
• Hbench is the revised version of Imbench.
Macro Benchmarks
• Measure the system’s overall performance
• The time needed to recompile the L4linux server was 6-7% slowr
than native Linux and 10-20% faster than both Mklinux versions.
• Commercial AIM multiuser benchmark used for a more systematic
evaluation
• The system performance under different application loads was
measured.
Compatibility Performance
Analysis
• The current implementation of L4linux comes close to
native Linux even under high load with penalties
ranging from 5-10%.
• Both the macro and micro benchmarks shows that
performance of microkernel matters.
• All benchmarks suggests that co-location itself does
not improve performance
Extensibility Performance
 Main advantage of microkernel- Extensibility/specialization
 Three questions:
1. Can we add services outside L4linux to improve performance
by specializing in Unix?
2. Can we improve certain applications by using native
microkernel mechanisms in addition to the classical API?
3. Can we achieve high performance for non-classical Unix
compatible systems coexisting with L4linux?
 These three questions are answered by specific examples.
Pipes and RPC
Four variants of data exchange are compared.
1. Standard pipe mechanism
2. Asynchronous pipes on L4- runs only on L4 and needs no Linux
kernel.
3. Synchronous RPC- Uses blocking IPC directly without buffering
data.
4. Synchronous mapping RPC- Sender maps pages in to receiver’s
address space
 Imbench used to measure latency and bandwidth.
Cache Partitioning
• L4’s hierarchical user level pagers allow both L4linux
memory system and a dedicated real time system to
run in parallel.
• The worst case execution time is considered the
optimization criteria in real time systems.
• A memory manager on top of L4 is used to partition
cache between multiple real time tasks to minimize
cache interference costs.
• Time for matrix multiplication was measured with:
1. Uninterrupted cache conflicts- 10.9ms
2. Interrupted cache conflicts- 96.1ms
3. Cache partitioning avoiding secondary cache
interference-24.9 ms
Virtual Memory Operations
• The time taken(in microseconds) for selected memory
operations in native Linux and L4linux are compared.
L4
Linux
Fault
6.2
n/a
Trap
3.4
12
Appel1
12
55
Appel2
10
44
Extensibility Performance Analysis
• Unix compatible functionality can be improved by
microkernel primitives. Eg: pipes, VM operations.
• Unix compatible or partially compatible functions can
be added to the system that outperforms
implementations based on unix API. Eg: RPC, User
level pagers for VM operations.
• Microkernel offers possibilities for coexisting systems
based on different paradigms. Eg: Real time systems
and MMU.
Alternative Basic Concepts
 Can a mechanism at a lower level than IPC or a grafting model
improve performance of a microkernel?
 Protected Control Transfer
• A parameter less cross address space procedure call via a callee
defined gate.
• Time taken for PCT and IPC were compared and PCT does not
offer significant improvement.
 Grafting
• Downloading extensions in to the kernel.
• Performance impact is still an open question.
Conclusion
• The performance of L4 is significantly better than the first
generation microkernel.
• The throughput for L4 is only 5% less than native Linux
whereas first generation microkernel were 5-7 times worse
than native Linux.
• The overall system performance does depend on the
performance of the microkernel.
• Modifications to Linux to suit L4 will further improve
performance.
Download