Towards a User-Mode Approach to Partitioned Scheduling in the seL4 Microkernel Mikael Åsberg and Thomas Nolte MRTC/Mälardalen University Outline ● Introduction ● Problem formulation ● General solution ● Contribution summary ● Background ● Related work ● Implementation ● Results ● Conclusion 2 Introduction ●Increasing software complexity=increase integration Shift to software complexity 3 Problem formulation ●Increasing software complexity implies a higher degree of software integration ●This can be observed in automotive industry - AUTOSAR ●More unpredictable ”interference” (difficult to analyze) when software execute together 4 General solution ● Solution: separate software into ”partitions” ● Seems to be the way to go in industry – ARINC653, pikeOS etc. ● Advocated by researchers in our community – Secure Embedded L4 (seL4) ● Partitioning facilitates: ● Certification/verification ● Reusability (clear interfaces) ● Controlled interference between applications ● Isolates CPU/memory overruns in each partition 5 Contribution summary ● An implementation of a one-level partition scheduler in the seL4 microkernel user-space ● Why: ● seL4 (currently) lacks proper time partitioning ● It’s a difficult trade-off between flexibility and performance ● Investigate the performance at user-space ● It’s flexible but potentially bad in performance ● The results will reveal if a ”verified” user-space scheduler is reasonable to develop for seL4 6 Background ● Partitioned/hierarchical scheduling ● seL4: a fully verified microkernel (machinechecked proof of ~9000 LOC) Trusted components Device' driver › •–‡• Real-time applicatio n … Untrusted Wh components / Linux / > charge Device driver / … … > d seL4 microkernel W W Hardware W … Sched the fiel schedul This fr main sp as hier for earl theorem (IMA) critical 7 to each Related work ● Partitioned/hierarchical scheduler implementations ● Wang et al. (1999), Linux ● Oikawa et al. (1999), Linux ● Kim et al. (2000), SPIRIT-uKernel ● Regehr et al. (2001), Windows 2000 ● …VxWorks, uC/OS-II, FreeRTOS… ● Yang et al. (2011), L4/Fiasco ● Verified scheduler implementations ● Muller et al. (2002/2004), Bossa/Linux ● Ha et al. (2004), DEOS kernel ● Åsberg et al. (2011), VxWorks 8 Implementation (1/3) ● Time partitioning in seL4: ● Privileged mode: ● + Good performance (fast scheduling decisions) ● - Flexibility (re-verification of the kernel) ● User mode: ● + Flexibility (re-verify a user-space module) ● - Bad performance (extra overhead of scheduling decisions) How bad is it? 9 Implementation (2/3) ● Scheduling in seL4: FPPS and Round robin ● Proposed scheduling: EDF with periodic partitions 10 Implementation (3/3) ● Implementation details: ● The scheduler is implemented as a user-space thread ● Highest priority (root thread) ● Triggered by periodic interrupts relayed from the seL4 kernel ● Time between interrupts is the scheduler resolution ● Scheduled threads are activated/deactivated using seL4 thread-management API functions ● Complexity O(1) for release and deadline queues (using bitmaps) 11 Results (1/3) ● Hardware/software setup: ● seL4 kernel (version 1.1) ● Emulated seL4 on QEMU (version 0.13.91) ● QEMU settings: Intel 533 MHz Pentium3 (Katmai, model 7, stepping 3) ● Time measurements using RDTSC (x86 register) 12 w p p o w s p t w ARM1176 (416MHz) processor (with L4/Fiasco). The comparisons are summarized in Table 1. Conclusively, it is difficult to draw any final conclusions from our ResultsThe (2/3) measurements. comparisons we have made relate to general system overheads in the seL4 and L4/Fiasco kernels. Based on this, the overhead of the PS (without rollback) does not seem over● Average scheduler overhead with 2-9 partitions: whelming, i.e., this overhead is at least not orders of magnitude Comparison to related work: larger● than general system overheads in seL4/L4/Fiasco kernels. Measurement Platform PS (with rollb.) Intel P3 533MHz (seL4) Scheduler invocation: ~213us PS Intel P3 533MHz (seL4) Context switch Intel P3 533MHz (seL4) Set timer [40] AMD 2GHz (L4/Fi.) seL4 context switch: ~109us System call [8] ARM-A8 800MHz (seL4) Int. delivery [8] ARM-A8 800MHz (seL4) IPC [16] ARM-11 416MHz (L4/Fi.) Table 1: Overhead comparison. Time (µs) 346 213 109 236 20 59/318 35/54 T s l ( o d t q 13 W 2 Results (3/3) task3 task3 PS PS cs cs idle idle 0 5 10 0 5 10 15 20 15 25 20 30 25 35 30 40 35 40 Figure 9: Execution trace of the PS (with rollback) and a context switch in seL4. Figure 9: Execution trace of the PS (with rollback) and a context switch in seL4. task1 task1 task2 task2 task3 task3 PS PS cs cs 0 5000 10000 15000 20000 0 5000 10000 15000 20000 25000 25000 30000 30000 35000 35000 40000 40000 45000 45000 50000 50000 55000 55000 60000 60000 65000 65000 70000 70000 75000 75000 80000 80000 Figure 10: Execution trace of a set of threads scheduled by the PS (with rollback) in seL4. 85000 85000 90000 90000 95000 14 95000 Conclusion ● Is the scheduler performance bad, i.e., to much overhead? ● Well, its not ”much” larger than related overheads, i.e., context switches, system calls, interrupt latency etc. ● 2x seL4 context switches, 4x IPC calls, 2,5x interrupt latency in a closed system (limitations on API calls) ● If further optimizations could squeeze down the overhead a bit then it could be a promising approach ● Future work: ● Optimize the implementation ● Perhaps develop a verified version 15 Thank you! 16