W4118 Operating Systems Interrupt and System Call in Linux Instructor: Junfeng Yang Logistics TAs Supreeth Subramanya • Office Hours: M 3-5pm • Address: CEPSR 7LW1 Yunling Wang • Office Hours: W 1-3pm • Address: TA room (Mudd 122A) Heming Cui • Office Hours: F 4-6PM • Address: TA room (Mudd 122A) Logistics (cont.) Textbooks Bookstore is working on the order We’ve included the problem statements in homework 1 page Homework 1 clarifications Your shell should wait for command to finish While command running, don’t prompt or accept new command NOTE: wait for the entire pipeline to finish When do IO redirection and pipe conflict? Tie two things to one file descriptor • Bad: “ls > 1.txt | grep FOO” • bad: “ls | sort < file.txt” Different shells handle conflicts differently Your shell should emit an error. • tcsh emits error. “Ambiguous output redirect.” • bash is silent. Any questions? Last lecture OS: event driven Events from device: interrupt Computer organization: CPU, device, memory, bus CPU’s “fetch-execute” cycle How to start this cycle: boot process Devices need CPU’s immediate attention. How? interrupt How it works • PIC translates IRQs to interrupt # • CPU looks up handler in Interrupt Descriptor Table Traps (or Exceptions): raised inside CPU Last lecture (cont.) Events from application: system call Often implemented via trap, e.g. int 0x80 in Linux The need for protection Dual-mode operation: user mode and kernel mode Privileged instructions can only execute in kernel mode Apps transit into kernel via system calls, so kernel can validate the calls and perform privileged instructions for them OS structure Simple Layered Today OS structure (cont.) Monolithic kernel v.s. Microkernel Virtual machines Intro to Linux Interrupts in Linux System calls in Linux Monolithic kernel All OS components run in kernel mode User mode Kernel mode FS Mem Net Why good? APP Can be efficient. Cross-component access cheap Why bad? No boundaries Big, complex kernel hard to change Trusted computing base (TCB) large, one error entire kernel crash, or be compromised • Hard to do new stuff in OS OS researchers unhappy • No flexibility for apps. Hard to customize for speed (web server) Microkernel Moves as much from the kernel into “user” space Restricted interface: no direct memory sharing between modules; need to send messages via kernel User mode FS Kernel mode Mem Net kernel APP Why good? Claimed advantages: Extensibility: new module = new user space program/library Flexibility: app can have own FS, Mem, Net, can make them fast Portability: easier to port kernel to new hardware Reliability & security: each module has own protection domain. if crash, just restart; can’t affect other modules. Microkernel (cont.) Big thing in 90s; best people worked on microkernel Students became top school professors Problem: slow, too many user-kernel crossings Can be fixed with fast IPC However, there remain problems. In the end, either download extensions into kernel, or merge all modules into a library looks like monolithic kernels, maybe even more complicated! Today: Windows, Linux, *BSD, MacOS, all monolithic Some criticism on microkernel Restricted interface complicated implementation • No shared state, hard to manage consistency Reliability & security: one key module fails, apps fail Modules Most microkernel advantages due to modularity Most modern operating systems implement kernel modules Uses object-oriented approach • Function pointers in Linux: strawman OOP with C Each talks to the others over known interfaces • But share one protection domain, so just call function Each is loadable as needed within the kernel Overall, similar to microkernel, but more flexible User mode Kernel mode FS APP Mem Net Virtual Machine Virtual Machine Monitor (VMM): kernel that provides hardware interface User mode Kernel mode Why good? APP APP APP OS OS OS VMM Isolation. Strong protection between VMs Consolidation. One physical machine, multiple VMs Mobility. Can move VMs around Standardization: same hw better system mgmt Virtual Machine (cont) Normal operating system environment: Virtualized guest operating systems: running in supervisor mode full access to machine state and I/O devices running in user mode no direct access to machine state Tasks of the virtual machine monitor: reconciling the virtual and physical architecture preventing virtual machines from interfering with each other or the monitor Do it fast? Not a easy job … Hosted virtual machines: VMware Desktop Products Architecture Today OS structure (cont.) Intro to Linux Interrupts in Linux System calls in Linux What is Linux? A modern, open-source OS based on UNIX standards 1991: written by Linus Torvalds from scratch, 0.1 MLOC • major design goal of UNIX compatibility Now: many developers worldwide, 10 MLOC Unique management model • Distributed development, central check in Linux distributions Ubuntu, Debian, Fedora, Redhat, CentOS, Slackware, Mandrake Linux, DreamLinux, SELinux, Gentoo, … All based on the Linux kernel, with different set of applications, package management methods and configurations Linux Licensing The Linux kernel is distributed under the GNU General Public License (GPL), the terms of which are set out by the Free Software Foundation Anyone using Linux, or creating their own derivative of Linux, may not make the derived product proprietary; software released under the GPL may not be redistributed as a binary-only product Linux kernel structure Core + dynamically loadable modules Modules include: device drivers, file systems, network protocols, etc Modules were originally developed to support the conditional inclusion of device drivers Early OS kernels would need to either: • include code for all possible devices or • be recompiled to add support for a new device Now, Modules can be dynamically loaded and unloaded Modules are used extensively Linux kernel structure (cont.) Applications System Libraries (libc) Modules System Call Interface I/O Related File Systems Networking Process Related Scheduler Memory Management Device Drivers IPC Architecture-Dependent Code Hardware Linux source tree Download: kernel.org (all releases + revision history) Browse: lxr.linux.no (with cross reference) Directory structure Public header files: include/ Each component is a subdir (e.g. mm/, ipc/ driver/) Usually interface + common functions + loadable modules Today OS structure (cont.) Intro to Linux Interrupts in Linux How interrupts implemented Linux, using x86 as ex System calls in Linux Types of Interrupts on 80386 Interrupts, asynchronous, from external devices, not related to code running Maskable interrupts Nonmaskable interrupts (NMI): hardware error Exceptions, synchronous, raised by CPU Processor-detected exceptions: • Faults — correctable; offending instruction is retried • Traps — often for debugging; instruction is not retried • Aborts — major error (hardware failure), EIP wrong Programmed exceptions: • Requests for kernel intervention (software intr/syscalls) Faults Instruction would be illegal to execute Examples: Writing to a memory segment marked ‘readonly’ Reading from an unavailable memory segment (on disk) page fault Executing a ‘privileged’ instruction Detected before incrementing the IP The causes of ‘faults’ can often be ‘fixed’ If a ‘problem’ can be remedied, then the CPU can just resume its execution-cycle Traps A CPU might have been programmed to automatically switch control to a ‘debugger’ program after it has executed an instruction That type of situation is known as a ‘trap’ It is activated after incrementing the IP Handling Exceptions Most error exceptions — divide by zero, invalid operation, illegal memory reference, etc. — translate directly into signals This isn’t a coincidence. . . The kernel’s job is fairly simple: send the appropriate signal to the current process force_sig(sig_number, current); That will probably kill the process, but that’s not the concern of the exception handler One important exception: page fault An exception can (infrequently) happen in the kernel die(); // kernel oops Interrupt # assignment Total possible 0-255 Interrupt ID numbers First 32 reserved by Intel for NMI and exceptions OS’s such as Linux are free to use the remaining 224 available interrupt ID numbers for their own purposes (e.g., for service-requests from external devices, or for other purposes such as system-calls) We’ve seen many examples in last lecture 0: divide-overflow fault 3: breakpoint 14: Page-Fault Exception 128: system call Called “vector” in ULK Interrupts in Linux Memory Bus intr # IRQs PIC idtr INTR CPU 0 IDT intr # ISR Assign IRQ to dev? IRQ to Intr #? Mask points 255 Assigning IRQs to Devices IRQ assignment is hardware-dependent Sometimes it’s hardwired, sometimes it’s set physically, sometimes it’s programmable PCI bus usually assigns IRQs at boot Some IRQs are fixed by the architecture IRQ0: Interval timer IRQ2: Cascade pin for 8259A Linux device drivers request IRQs when the device is opened Especially useful for dynamically-loaded drivers, such as for USB or PCMCIA devices Two devices that aren’t used at the same time can share an IRQ, even if the hardware doesn’t support simultaneous sharing Assigning Interrupt # to IRQs Intr #: index (0-255) into interrupt descriptor table Intr #: usually IRQ + 32 Below 32 reserved for non-maskable intr & exceptions Maskable interrupts can be assigned as needed Vector 128 used for syscall Vectors 251-255 used for Inter-Processor Interrupt (IPI) Interrupts in Linux Memory Bus intr # IRQs PIC idtr INTR CPU 0 IDT intr # ISR Multicore? Mask points 255 Multiple Logical Processors Multi-CORE CPU CPU 0 CPU 1 LOCAL APIC LOCAL APIC I/O APIC Advanced Programmable Interrupt Controller is needed to perform ‘routing’ of I/O requests from peripherals to CPUs APIC, IO-APIC, LAPIC Advanced PIC (APIC) for SMP systems Local APIC (LAPIC) versus “frontend” IO-APIC Used in all modern systems Interrupts “routed” to CPU over system bus IPI: inter-processor interrupt Devices connect to front-end IO-APIC IO-APIC communicates (over bus) with Local APIC Interrupt routing Allows broadcast or selective routing of interrupts Ability to distribute interrupt handling load Routes to lowest priority process • Special register: Task Priority Register (TPR) Arbitrates (round-robin) if equal priority Interrupts in Linux Memory Bus intr # IRQs PIC idtr INTR CPU 0 IDT intr # ISR How to set up IDT? Mask points 255 Interrupt Descriptor Table The ‘entry-point’ to the interrupt-handler is located via the Interrupt Descriptor Table (IDT) IDT: “gate descriptors” Location of handler Descriptor Privilege Level (DPL), prevent bad access • Can invoke only when current privilege level (CPL) < DPL • This is just the mode bit for protection Gates (slightly different ways of entering kernel) • Interrupt gate: disables further interrupts • Trap gate: further interrupts still allowed • Task gate: includes TSS to transfer to (used when EIP is bad, or hardware failure) IDT Initialization Initialized once by BIOS in real mode Must not expose kernel to user mode access Linux re-initializes during kernel init start by setting all descriptors to null handler ignore_int() Then, set up entries we handle E.g. arch/i386/kernel/traps.c, function trap_init() Linux lingo Interrupt gate = Intel Interrupt, maskable or non maskable System gate = Intel trap with user access (DPL = 3) and interrupt enabled no user access (DPL = 0) disable interrupt when invoking handler E.g. set_intr_gate(2, &nmi) into (#4), bounds (#5), system call (#128) E.g. set_system_gate(4, &overflow) Sometimes want to disable interrupt for int3, set_system_interrupt_gate(3, &int3) Trap gate == Intel trap and fault, no user access (DPL = 0) and interrupt enabled set_trap_gate(0, &divide_error) Interrupts in Linux Memory Bus intr # IRQs PIC idtr INTR CPU 0 IDT intr # ISR How to load ISR? Mask points 255 Loading an Interrupt handler Hardware locates the proper gate descriptor for this interrupt vector, and locates the new context Verifies Current Privilege Level (CPL) <= Descriptor Privilege level (DPL) Load a new stack pointer if needed Hw saves old IP, etc on new stack Set IP, etc to interrupt handler = invoke handler disable interrupt by unsetting IF bit in eflags register Handler saves old CPU state on new stack Finding the Proper Handler On modern hardware, multiple I/O devices can share a single IRQ and hence interrupt vector First differentiator is the interrupt vector Multiple interrupt service routines (ISR) can be associated with a vector Each device’s ISR for that IRQ is called; the determination of whether or not that device has interrupted is device-dependent Next lecture Interrupts in Linux (cont.) System calls in Linux Process (read OSC ch 3)