Kernel Synchronization Examples From the Linux Kernel Michael E. Locasto kernel control flow is a complicated, asynchronous interleaving BIG PICTURE: HOW CAN THE KERNEL CORRECTLY SERVICE REQUESTS? Main Ideas / Concepts Atomic operations in x86 Kernel locking / synchronization primitives Kernel preemption Read-Copy-Update The “big kernel lock” Kernel Preemption Kernel preemption is a concept in which the kernel can preempt other running kernel control paths (be they on behalf of a user or another kernel thread) Acquiring a spinlock automatically disables kernel preemption (as we will see in the code) Synchronization Primitives Atomic operations Disable interrupts (cli/sti modify IF of eflags) Lock memory bus (x86 lock prefix) Spin locks Semaphores Sequence Locks Read-copy-update (RCU) (lock free) Barriers Barriers are serializing operations; they “gather” and make operations sequential. Memory barrier: x86 in/out on I/O ports x86 lock prefix x86 writes to CReg, SReg/eflags, DReg x86 instr meaning lfence read barrier sfence write barrier mfence r/w barrier Barrier Implementation Motivating Example: Using Semaphores in the Kernel what are: down_read, up_read, and mmap_sem Let’s start with the data structure and see where that leads… START WITH THE DATA STRUCTURE: MM->MMAP_SEM current->mm->mmap_sem struct mm_struct: include/linux/mm_types.h PRIMITIVE ONE: ATOMIC TYPE AND OPERATIONS On x86, these operations are atomic simple asm instructions that involve 0 or 1 aligned memory access read-modify-update in 1 clock cycle (e.g., inc, dec) anything prefixed by the IA-32 ‘lock’ prefix atomic_t: include/linux/types.h Example: Reference Counters Refcounts: atomic_t; associated with resources, but keeps count of kernel control paths accessing the resource PRIMITIVE TWO: SPINLOCKS /include/linux/spinlock_types.h typedef struct spinlock{ struct raw_spinlock rlock; } spinlock_t; typedef struct raw_spinlock{ arch_spinlock_t raw_lock; } raw_spinlock_t; arch/x86/include/asm/spinlock_types. h#L10 slock=1 (unlocked), slock=0 (locked) spinlock API (partial) /include/linux/spinlock.h /kernel/spinlock.c include/linux/spinlock_api_smp.h Linux Tracks Lock Dependencies @ Runtime Here we mainly consider Read/Write Semaphores PRIMITIVE THREE: SEMAPHORES Important Caveats about Kernel Semaphores Semaphores are *not* like spinlocks in the sense that the invoking process is put to sleep rather than busy waits. As a result, kernel semaphores should only be used by functions that can safely sleep (i.e., not interrupt handlers) might_sleep() leads (eventually) to: rwsem_wake __rwsem_do_wake On our way out, allow a writer at the front of the waiting queue to proceed. Then allow unlimited numbers of readers to access the critical region. Advanced Techniques Sequence Locks A solution to the multiple readers-writer problem in that a writer is permitted to advance even if readers are in the critical section. Readers must check both an entry and exit flag to see if data has been modified underneath them. Read-Copy-Update (RCU) Designed to protect data structures accessed by multiple CPUs; allows many readers and writers. Basic idea is simple (and in the name). Readers access data structure via a pointer; writers initially act as readers & create a copy to modify. “Writing” is just a matter of updating the pointer. RCU Only for kernel control paths; disables preemption. Used to protect data structures accessed through a pointer by adding a layer of indirection, we can reduce wholesale writes/updates to a single atomic write/update Heavy restrictions: RCU tasks cannot sleep readers do little work writers act as readers, make a copy, then update copy. Finally, they rewrite the pointer. cleanup is correspondingly complicated. http://lxr.linux.no/#linux+v2.6.35.14/kernel/timer.c#L1354 RCU EXAMPLE: GETPPID(2) Does synchronization impose a significant cost? (test at user level) EXERCISE: TIME PERFORMANCE COST OF SYNCHRONIZATION CODE: AUTOMATICALLY DRAWING RESOURCE GRAPHS