Interrupts (Hardware) Interrupt Descriptor Table IDT specified as a segment using the IDTR register Slide #2 Slide #3 Calling the IRQ handler Interrupt Context Exceptions • First 32 IRQ vectors in IDT – Correspond to events generated by the CPU – Page fault, Divide by zero, invalid instruction, etc • Full list in the CPU architecture manuals – Generally its an “error” or “exception” encountered during CPU instruction execution • IDT is referenced directly by the CPU – Completely internalized External Interrupts • Events triggered by devices connected to the system – Network packet arrivals, disk operation completion, timer updates, etc – Can be mapped to any IRQ vector above the exceptions • (IRQs 32-255) • External because they happen outside the CPU – External logic signals CPU and notifies it which handler to execute – Managed by Interrupt Controller • Special device included in southbridge Interrupt Controllers • Translate device IRQ signals to CPU IRQ vectors – Each device has only a single pin • High = IRQ pending, Low = no IRQ – Interrupt controller maps devices to vectors • Two x86 controller classes – Legacy: 8259 PIC • Connected to a set of default I/O ports on CPU – Modern: APIC + IOAPIC • Memory mapped into each CPUs physical memory – (How?) • Next generation APIC (x2PIC) accessed via MSRs – Model specific registers – control registers accessed via special instructions » WRMSR, RDMSR 8259 PIC • Allows the mapping of 8 IRQ pins (from devices) to 8 separate vectors (to CPU) • Assumes continuous numbering • Assign the PIC a vector offset, • Each pin index is added to that offset to calculate the vector 8259 Cont’d • 1 PIC only supports 8 device lines – Often more than 8 devices in a system – Solution: Add more PICs • But x86 CPUs only have 1 INTR input pin • X86 Solution: – Chain the PICs together (master and slave) • Slave PIC is attached to the 2nd IRQ pin of the master • CPU interfaces directly with master • PC architecture defines common default devices to each PIC input pin • Initially PIC vector offsets set to 32, just above exceptions IRQ Example IRQ INT Hardware Device 0 32 Timer 1 33 Keyboard 2 34 PIC Cascading 3 35 Second serial port 4 36 First serial port 6 38 Floppy Disk 8 40 System Clock 10 42 Network Interface 11 43 USB port, sound card 12 44 PS/2 Mouse 13 45 Math Coprocessor 14 46 EIDE first controller 15 47 EIDE second controller Slide #12 APIC • Problem: PICs don’t support multiple CPUs – Only one INT signal, so only one CPU can receive interrupts • SMP required a new solution – APIC + IOAPIC – Idea: Separate the responsibility of the PIC into two components • APIC = Interfaces with CPU • IOAPIC = Interfaces with devices APIC • Each CPU has its own local APIC – In charge of keeping track of interrupts bound for its assigned CPU – Since Pentium Pro, the APIC has been implemented in the CPU die • APIC interfaces with CPUs interrupt pins to invoke correct IDT vector – This is its primary responsibility • But it does other things as well – Timer – APIC has its own timer device per CPU • Legacy PC had a separate timer device on the motherboard • Allows each CPU to have its own timer – Inter-Processor Interrupt – Allows cross CPU communication • 1 CPU can send an interrupt to another one • Why would you want to do this? • How does the APIC do this? ICC bus • APICs and IOAPICs share a common communication bus • ICC bus: Interrupt Controller Communication Bus • Handles routing of interrupts to the correct APIC IOAPIC • Connects devices to ICC bus – Must still translate IRQ pins from devices to vectors – But now must also select destination APIC • Typically has 24 I/O Redirection Table Registers – Specifies vector # to send to APIC – Specifies which APIC (or group of APICS) can accept the IRQ • Several methods of specifying APIC addresses – Allows masking of IRQ pins IO-APIC configuration • Usually initialized to mirror the PIC configuration – But as architecture diverge from legacy PC, this is becoming harder – Generally speaking resource discovery is UGLY • OS then can map IO-APIC entries to which ever vector on whichever CPU they want – This means that IRQ vectors can be reused between CPUs Interrupt Vectors Vector Range Use 0-19 Nonmaskable interrupts and exceptions. 20-31 Intel-reserved 32-127 External interrupts (IRQs) 128 System Call exception 129-238 External interrupts (IRQs) 239 Local APIC timer interrupt 240 Local APIC thermal interrupt 241-250 Reserved by Linux for future use 251-253 Interprocessor interrupts 254 Local APIC error interrupt 255 Local APIC suprious interrupt Slide #18 IRQ Handling 1. Monitor IRQ lines for raised signals. If multiple IRQs raised, select lowest # IRQ. 2. If raised signal detected 1. 2. 3. 4. 5. 6. Converts raised signal into vector (0-255). Stores vector in I/O port, allowing CPU to read. Sends raised signal to CPU INTR pin. Waits for CPU to acknowledge interrupt. Kernel runs do_IRQ(). Clears INTR line. 3. Goto step 1. Slide #19 do_IRQ 1. 2. 3. 4. 5. 6. 7. 8. 9. Kernel jumps to entry point in entry.S. Entry point saves registers, calls do_IRQ(). Finds IRQ number in saved %EAX register. Looks up IRQ descriptor using IRQ #. Acknowledges receipt of interrupt. Disables interrupt delivery on line. Calls handle_IRQ_event() to run handlers. Cleans up and returns. Jumps to ret_from_intr(). Slide #20 handle_IRQ_event() fastcall int handle_IRQ_event(unsigned int irq, struct pt_regs *regs, struct irqaction *action) { int ret, retval = 0, status = 0; if (!(action->flags & SA_INTERRUPT)) local_irq_enable(); do { ret = action->handler(irq, action->dev_id, regs); if (ret == IRQ_HANDLED) status |= action->flags; retval |= ret; action = action->next; } while (action); if (status & SA_SAMPLE_RANDOM) add_interrupt_randomness(irq); local_irq_disable(); return retval; } Slide #21 Interrupt Handlers Function kernel runs in response to interrupt. More than one handler can exist per IRQ. Must run quickly. Resume execution of interrupted code. How to deal with high work interrupts? Ex: network, hard disk Slide #22 Top and Bottom Halves Top Half The interrupt handler. Current interrupt disabled, possibly all disabled. Runs in interrupt context, not process context. Can’t sleep. Acknowledges receipt of interrupt. Schedules bottom half to run later. Bottom Half Runs in process context with interrupts enabled. Performs most work required. Can sleep. Ex: copies network data to memory buffers. Slide #23 Interrupt Context Not associated with a process. Cannot sleep: no task to reschedule. current macro points to interrupted process. Shares kernel stack of interrupted process. Be very frugal in stack usage. Slide #24 Registering a Handler request_irq() Register an interrupt handler on a given line. free_irq() Unregister a given interrupt handler. Disable interrupt line if all handlers unregistered. Slide #25 Registering a Handler int request_irq(unsigned int irq, irqreturn_t (*handler)(int, void *, struct pt_regs *), unsigned long irqflags, const char * devname, void *dev_id) irqflaqs = SA_INTERRUPT | SA_SAMPLE_RANDOM | SA_SHIRQ Slide #26 Writing an Interrupt Handler irqreturn_t ih(int irq,void *devid,struct pt_regs *r) Differentiating between devices Pre-2.0: irq Current: dev_id Registers Pointer to registers before interrupt occurred. Return Values IRQ_NONE: Interrupt not for handler. IRQ_HANDLED: Interrupted handled. Slide #27 RTC Handler irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs) { spin_lock (&rtc_lock); rtc_irq_data += 0x100; rtc_irq_data &= ~0xff; if (rtc_status & RTC_TIMER_ON) mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100); spin_unlock (&rtc_lock); /* Now do the rest of the actions */ spin_lock(&rtc_task_lock); if (rtc_callback) rtc_callback->func(rtc_callback->private_data); spin_unlock(&rtc_task_lock); wake_up_interruptible(&rtc_wait); kill_fasync (&rtc_async_queue, SIGIO, POLL_IN); return IRQ_HANDLED; } Slide #28 Interrupt Control Disable/Enable Local Interrupts local_irq_disable(); /* interrupts are disabled */ local_irq_enable(); Saving and Restoring IRQ state Useful when don’t know prior IRQ state. unsigned long flags; local_irq_save(flags); /* interrupts are disabled */ local_irq_restore(flags); /* interrupts in original state */ Slide #29 Interrupt Control Disabling Specific Interrupts For legacy hardware, avoid for shared IRQ lines. disable_irq(irq) enable_irq(irq) What about other processors? Disable local interrupts + spin lock. We’ll talk about spin locks next time… Slide #30 Bottom Halves Perform most work required by interrupt. Run in process context with interrupts enabled. Three forms of deferring work SoftIRQs Tasklets Work Queues Slide #31 SoftIRQs Statically allocated at compile time. Only 32 softIRQs can exist (only 6 currently used.) struct softirq_action { void (*action)(struct softirq_action *); void *data; }; static struct softirq_action softirq_vec[32]; Tasklets built on SoftIRQs. All tasklets use one SoftIRQ. Dynamically allocated. Slide #32 SoftIRQ Handlers Prototype void softirq_handler(struct softirq_action *) Calling my_softirq->action(my_softirq); Pre-emption SoftIRQs don’t pre-empt other softIRQs. Interrupt handlers can pre-empt softIRQs. Another softIRQ can run on other CPUs. Slide #33 Executing SoftIRQs Interrupt handler marks softIRQ. Called raising the softirq. SoftIRQs checked for execution: In return from hardware interrupt code. In ksoftirq kernel thread. In any code that explicitly checks for softIRQs. do_softirq() Loops over all softIRQs. Slide #34 Current SoftIRQs SoftIRQ Priority Description HI 0 High priority tasklets. TIMER 1 Timer bottom half. NET_TX 2 Send network packets. NET_RX 3 Receive network packets. SCSI 4 SCSI bottom half. TASKLET 5 Tasklets. Slide #35 Tasklets • Implemented as softIRQs. – Linked list of tasklet_struct objects. • Two priorities of tasklets: – HI: tasklet_hi_schedule() – TASKLET: tasklet_schedule() • Scheduled tasklets run via do_softirq() – HI action: tasklet_action() – TASKLET action: tasklet_hi_action() Slide #36 ksoftirqd SoftIRQs may occur at high frequencies. SoftIRQs may re-raise themselves. Kernel will not handle re-raised softIRQs immediately in do_softirq(). Kernel thread ksoftirq solves problem. One thread per processor. Runs at lowest priority (nice +19). Slide #37 Work Queues Defer work into a kernel thread. Execute in process context. One thread per processor: events/n. Processes can create own threads if needed. struct workqueue_struct { struct cpu_workqueue_struct cpu_wq[NR_CPUS]; const char *name; struct list_head list; /* Empty if single thread */ }; Slide #38 Work Queue Data Structures worker thread cpu_workqueue_struct 1/CPU workqueue_struct 1/thread type work_struct work_struct work_struct 1/deferrable function Slide #39 Worker Thread Each thread runs worker_thread() 1. 2. 3. 4. 5. Marks self as sleeping. Adds self to wait queue. If linked list of work empty, schedule(). Else, marks self as running, removes from queue. Calls run_workqueue() to perform work. Slide #40 run_workqueue() 1. Loops through list of work_structs struct work_struct { unsigned long pending; struct list_head entry; void (*func)(void *); void *data; void *wq_data; struct timer_list timer; }; 2. Retrieves function, func, and arg, data 3. Removes entry from list, clears pending 4. Invokes function Slide #41 Which Bottom Half to Use? 1. If needs to sleep, use work queue. 2. If doesn’t need to sleep, use tasklet. 3. What about serialization needs? Bottom Half Softirq Context Interrupt Serialization None Tasklet Interrupt Against same tasklet Work queues Process None Slide #42 Timer Interrupt Executed HZ times a second. #define HZ 1000 /* <asm/param.h> */ Called the tick rate. Time between two interrupts is a tick. Driven by Programmable Interrupt Timer (PIT). Interrupt handler responsibilities Updating uptime, system time, kernel stats. Rescheduling if current has exhausted time slice. Balancing scheduler runqueues. Running dynamic timers. Slide #43 Jiffies Jiffies = number of ticks since boot. extern unsigned long volatile jiffies; Incremented each timer interrupt. Uptime = jiffies/HZ seconds. Convert for user space: jiffies_to_clock_t() Comparing jiffies, while avoiding overflow. time_after(a, b): a > b time_before(a,b) a < b time_after_eq(a,b): a >= b time_before_eq(a,b): a <= b Slide #44 Timer Interrupt Handler 1. 2. 3. 4. 5. 6. Increments jiffies. Update resource usages (sys + user time.) Run dynamic timers. Execute scheduler_tick(). Update wall time. Calculate load average. Slide #45