Interrupts and Exceptions COMS W4118 Spring 2008

advertisement
Interrupts and
Exceptions
COMS W4118
Spring 2008
Overview











CPU Control Flow
Interrupts and Exceptions
Exception Types and Handling
Interrupt Request Lines (IRQs)
Programmable Interrupt Controllers (PIC)
Interrupt Descriptor Table (IDT)
Hardware Dispatching of Interrupts
Nested Execution
Kernel Stacks
SoftIRQs, Tasklets
Work Queues
Simplified Architecture Diagram
Central
Processing
Unit
Main
Memory
system bus
I/O
device
I/O
device
I/O
device
I/O
device
Motivation




Utility of a general-purpose computer depends on its
ability to interact with I/O devices attached to it (e.g.,
keyboard, display, disk-drives, network, etc.)
Devices require a prompt response from the CPU
when various events occur, even when the CPU is
busy running a program
Need a mechanism for a device to “gain CPU’s
attention”
Interrupts provide a way doing this
Interrupts


Forcibly change normal flow of control
Similar to context switch (but lighter weight)




Hardware saves some context on stack; Includes
interrupted instruction if restart needed
Enters kernel at a specific point; kernel then
figures out which interrupt handler should run
Execution resumes with special “iret” instruction
Many different types of interrupts
Types of Interrupts

Asynchronous



From external source, such as I/O device
Not related to instruction being executed
Synchronous (also called exceptions)


Processor-detected exceptions:
 Faults — correctable; offending instruction is retried
 Traps — often for debugging; instruction is not retried
 Aborts — major error (hardware failure)
Programmed exceptions:
 Requests for kernel intervention (software intr/syscalls)
CPU’s ‘fetch-execute’ cycle
User
Program
Fetch instruction at IP
ld
add
IP
st
Save context
Decode the fetched instruction
mul
ld
Get INTR ID
Execute the decoded instruction
sub
Lookup ISR
bne
add
Advance IP to next instruction
Execute ISR
jmp
…
IRQ?
no
yes
IRET
Faults


Instruction would be illegal to execute
Examples:






Writing to a memory segment marked ‘read-only’
Reading from an unavailable memory segment (on disk)
Executing a ‘privileged’ instruction
Detected before incrementing the IP
The causes of ‘faults’ can often be ‘fixed’
If a ‘problem’ can be remedied, then the CPU can
just resume its execution-cycle
Traps



A CPU might have been programmed to
automatically switch control to a ‘debugger’
program after it has executed an instruction
That type of situation is known as a ‘trap’
It is activated after incrementing the IP
Error Exceptions



Most error exceptions — divide by zero, invalid
operation, illegal memory reference, etc. — translate
directly into signals
This isn’t a coincidence. . .
The kernel’s job is fairly simple: send the
appropriate signal to the current process




force_sig(sig_number, current);
That will probably kill the process, but that’s not the
concern of the exception handler
One important exception: page fault
An exception can (infrequently) happen in the kernel

die(); // kernel oops
Intel-Reserved ID-Numbers



Of the 256 possible interrupt ID numbers, Intel reserves the first 32
for ‘exceptions’
OS’s such as Linux are free to use the remaining 224 available
interrupt ID numbers for their own purposes (e.g., for servicerequests from external devices, or for other purposes such as
system-calls)
Examples:

0: divide-overflow fault

6: Undefined Opcode

7: Coprocessor Not Available
 11: Segment-Not-Present fault
 12: Stack fault
 13: General Protection Exception
 14: Page-Fault Exception
Interrupt Hardware
Legacy PC Design
(for single-proc
systems)
Ethernet
IRQs
Slave
PIC
(8259)
SCSI Disk
Master
PIC
(8259)
INTR
x86
CPU
Real-Time Clock
Keyboard Controller



Programmable Interval-Timer
I/O devices have (unique or shared) Interrupt Request
Lines (IRQs)
IRQs are mapped by special hardware to interrupt
vectors, and passed to the CPU
This hardware is called a Programmable Interrupt
Controller (PIC)
The `Interrupt Controller’





Responsible for telling the CPU when a specific
external device wishes to ‘interrupt’
 Needs to tell the CPU which one among several
devices is the one needing service
PIC translates IRQ to vector
 Raises interrupt to CPU
 Vector available in register
 Waits for ack from CPU
Interrupts can have varying priorities
 PIC also needs to prioritize multiple requests
Possible to “mask” (disable) interrupts at PIC or CPU
Early systems cascaded two 8 input chips (8259A)
Example: Interrupts on 80386


80386 core has one interrupt line, one interrupt
acknowledge line
Interrupt sequence:



Interrupt controller raises INT line
80386 core pulses INTA line low, allowing INT to go low
80386 core pulses INTA line low again, signaling controller
to put interrupt number on data bus
INT:
INTA:
Data bus:
Interrupt #
Multiple Logical Processors
Multi-CORE CPU
CPU
0
CPU
1
LOCAL
APIC
LOCAL
APIC
I/O
APIC
Advanced Programmable Interrupt Controller is needed to
perform ‘routing’ of I/O requests from peripherals to CPUs
(The legacy PICs are masked when the APICs are enabled)
APIC, IO-APIC, LAPIC

Advanced PIC (APIC) for SMP systems




Local APIC (LAPIC) versus “frontend” IO-APIC



Used in all modern systems
Interrupts “routed” to CPU over system bus
IPI: inter-processor interrupt
Devices connect to front-end IO-APIC
IO-APIC communicates (over bus) with Local APIC
Interrupt routing




Allows broadcast or selective routing of interrupts
Ability to distribute interrupt handling load
Routes to lowest priority process
 Special register: Task Priority Register (TPR)
Arbitrates (round-robin) if equal priority
Assigning IRQs to Devices







IRQ assignment is hardware-dependent
Sometimes it’s hardwired, sometimes it’s set physically,
sometimes it’s programmable
PCI bus usually assigns IRQs at boot
Some IRQs are fixed by the architecture
 IRQ0: Interval timer
 IRQ2: Cascade pin for 8259A
Linux device drivers request IRQs when the device is opened
Note: especially useful for dynamically-loaded drivers, such as
for USB or PCMCIA devices
Two devices that aren’t used at the same time can share an IRQ,
even if the hardware doesn’t support simultaneous sharing
Assigning Vectors to IRQs


Vector: index (0-255) into interrupt descriptor table
Vectors usually IRQ# + 32




Below 32 reserved for non-maskable intr & exceptions
Maskable interrupts can be assigned as needed
Vector 128 used for syscall
Vectors 251-255 used for IPI
Interrupt Descriptor Table


The ‘entry-point’ to the interrupt-handler is located
via the Interrupt Descriptor Table (IDT)
IDT: “gate descriptors”



Segment selector + offset for handler
Descriptor Privilege Level (DPL)
Gates (slightly different ways of entering kernel)
 Task gate: includes TSS to transfer to (not used by
Linux)


Interrupt gate: disables further interrupts
Trap gate: further interrupts still allowed
Interrupt Masking




Two different types: global and per-IRQ
Global — delays all interrupts
Selective — individual IRQs can be masked
selectively
Selective masking is usually what’s needed
— interference most common from two
interrupts of the same type
Putting It All Together
Memory Bus
IRQs
0
idtr
PIC
INTR
CPU
0
IDT
vector
N
handler
Mask points
255
Dispatching Interrupts







Each interrupt has to be handled by a special
device- or trap-specific routine
Interrupt Descriptor Table (IDT) has gate descriptors
for each interrupt vector
Hardware locates the proper gate descriptor for this
interrupt vector, and locates the new context
A new stack pointer, program counter, CPU and
memory state, etc., are loaded
Global interrupt mask set
The old program counter, stack pointer, CPU and
memory state, etc., are saved on the new stack
The specific handler is invoked
Nested Interrupts



What if a second interrupt occurs while an
interrupt routine is excuting?
Generally a good thing to permit that — is it
possible?
And why is it a good thing?
Maximizing Parallelism



You want to keep all I/O devices as busy as
possible
In general, an I/O interrupt represents the
end of an operation; another request should
be issued as soon as possible
Most devices don’t interfere with each others’
data structures; there’s no reason to block
out other devices
Handling Nested Interrupts




As soon as possible, unmask the global
interrupt
As soon as reasonable, re-enable interrupts
from that IRQ
But that isn’t always a great idea, since it
could cause re-entry to the same handler
IRQ-specific mask is not enabled during
interrupt-handling
Nested Execution

Interrupts can be interrupted





Exceptions can be interrupted


By different interrupts; handlers need not be reentrant
No notion of priority in Linux
Small portions execute with interrupts disabled
Interrupts remain pending until acked by CPU
By interrupts (devices needing service)
Exceptions can nest two levels deep



Exceptions indicate coding error
Exception code (kernel code) shouldn’t have bugs
Page fault is possible (trying to touch user data)
First-Level Interrupt Handler




Often in assembler
Perform minimal, common functions: saving
registers, unmasking other interrupts
Eventually, undoes that: restores registers,
returns to previous context
Most important: call proper second-level
interrupt handler (C program)
Interrupt Handling Philosophy



Do as little as possible in the interrupt handler
Defer non-critical actions till later
Structure: top and bottom halves




Top-half: do minimum work and return (ISR)
Bottom-half: deferred processing (softirqs,
tasklets)
Again — want to do as little as possible with
IRQ interrupts masked
No process context available
No Process Context





Interrupts (as opposed to exceptions) are not
associated with particular instructions
They’re also not associated with a given
process
The currently-running process, at the time of
the interrupt, as no relationship whatsoever to
that interrupt
Interrupt handlers cannot refer to current
Interrupt handlers cannot sleep!
Interrupt Stack

When an interrupt occurs, what stack is
used?




Exceptions: The kernel stack of the current
process, whatever it is, is used (There’s always
some process running — the “idle” process, if
nothing else)
Interrupts: hard IRQ stack (1 per processor)
SoftIRQs: soft IRQ stack (1 per processor)
These stacks are configured in the IDT and
TSS at boot time by the kernel
Finding the Proper Handler




On modern hardware, multiple I/O devices
can share a single IRQ and hence interrupt
vector
First differentiator is the interrupt vector
Multiple interrupt service routines (ISR) can
be associated with a vector
Each device’s ISR for that IRQ is called; the
determination of whether or not that device
has interrupted is device-dependent
Deferrable Work

We don’t want to do too much in regular interrupt
handlers:






Interrupts are masked
We don’t want the kernel stack to grow too much
Instead, interrupt handlers schedule work to be
performed later
Three deferred work mechanisms: softirqs, tasklets,
and work queues
Tasklets are built on top of softirqs
For all of these, requests are queued
Softirqs


Statically allocated: specified at kernel compile time
Limited number:
Priority Type
0
High-priority tasklets
1
Timer interrupts
2
Network transmission
3
Network reception
4
SCSI disks
5
Regular tasklets
Running Softirqs



Run at various points by the kernel
Most important: after handling IRQs and after
timer interrupts
Softirq routines can be executed
simultaneously on multiple CPUs:


Code must be re-entrant
Code must do its own locking as needed
Rescheduling Softirqs




A softirq routine can reschedule itself
This could starve user-level processes
Softirq scheduler only runs a limited number
of requests at a time
The rest are executed by a kernel thread,
ksoftirqd, which competes with user
processes for CPU time
Tasklets





Similar to softirqs
Created and destroyed dynamically
Individual tasklets are locked during
execution; no problem about re-entrancy, and
no need for locking by the code
Only one instance of tasklet can run, even
with multiple CPUs
Preferred mechanism for most deferred
activity
Work Queues




Always run by kernel threads
Softirqs and tasklets run in an interrupt
context; work queues have a process context
Because they have a process context, they
can sleep
However, they’re kernel-only; there is no user
mode associated with it
Return Code Path

Interleaved assembly entry points:





ret_from_exception()
ret_from_intr()
ret_from_sys_call()
ret_from_fork()
Things that happen:


Run scheduler if necessary
Return to user mode if no nested handlers




Restore context, user-stack, switch mode
Re-enable interrupts if necessary
Deliver pending signals
(Some DOS emulation stuff – VM86 Mode)
Monitoring Interrupt Activity


Linux has a pseudo-file system, /proc, for
monitoring (and sometimes changing) kernel
behavior
Run
cat /proc/interrupts
to see what’s going on
/proc/interrupts
$ cat /proc/interrupts
CPU0
0: 865119901
1:
4
2:
0
8:
1
12:
20
14:
6532494
15:
34
16:
0
19:
0
23:
0
32:
40
33:
40
48: 273306628
NMI:
0
ERR:
0

IO-APIC-edge
IO-APIC-edge
XT-PIC
IO-APIC-edge
IO-APIC-edge
IO-APIC-edge
IO-APIC-edge
IO-APIC-level
IO-APIC-level
IO-APIC-level
IO-APIC-level
IO-APIC-level
IO-APIC-level
timer
keyboard
cascade
rtc
PS/2 Mouse
ide0
ide1
usb-uhci
usb-uhci
ehci-hcd
ioc0
ioc1
eth0
Columns: IRQ, count, interrupt controller, devices
More in /proc/pci:
$ cat /proc/pci
PCI devices found:
Bus 0, device
0, function 0:
Host bridge: PCI device 8086:2550 (Intel Corp.) (rev 3).
Prefetchable 32 bit memory at 0xe8000000 [0xebffffff].
Bus 0, device 29, function 1:
USB Controller: Intel Corp. 82801DB USB (Hub #2) (rev 2).
IRQ 19.
I/O at 0xd400 [0xd41f].
Bus 0, device 31, function 1:
IDE interface: Intel Corp. 82801DB ICH4 IDE (rev 2).
IRQ 16.
I/O at 0xf000 [0xf00f].
Non-prefetchable 32 bit memory at 0x80000000 [0x800003ff].
Bus 3, device
1, function 0:
Ethernet controller: Broadcom NetXtreme BCM5703X Gigabit Eth (rev 2).
IRQ 48.
Master Capable. Latency=64. Min Gnt=64.
Non-prefetchable 64 bit memory at 0xf7000000 [0xf700ffff].
Portability



Which has a higher priority, a disk interrupt or
a network interrupt?
Different CPU architectures make different
decisions
By not assuming or enforcing any priority,
Linux becomes more portable
Summary









Exception vs. Interrupt
 Synchronous vs. Asynchronous
 Fault vs. Trap
Relation between exceptions and signals
Device  IRQs  APICs  Interrupt
Hardware handling (vector)
Interrupt masking
ISRs can share IRQs
Interrupt stacks (Kernel, HardIRQ, SoftIRQ)
SoftIRQs, tasklets, workqueues, ksoftirqd
 Interrupt context vs. process context
 Who can block
Opportunities for rescheduling (preempting)
Backup Foils
SoftIRQs vs. Tasklet vs. WQ
ISR
SoftIRQ
Tasklet
WorkQueue
Will disable all interrupts?
Briefly
No
No
No
Will disable other instances of self?
Yes
Yes
No
No
Higher priority than regular scheduled tasks?
Yes
Yes*
Yes*
No
Will be run on same processor as ISR?
N/A
Yes
Maybe
Maybe
More than one run can on same CPU?
No
No
No
Yes
Same one can run on multiple CPUs?
Yes
Yes
No
Yes
Full context switch?
No
No
No
Yes
Can sleep? (Has own kernel stack)
No
No
No
Yes
Can access user space?
No
No
No
No
*Within limits, can be run by ksoftirqd
Three crucial data-structures

The Global Descriptor Table (GDT)
defines the system’s memory-segments and their
access-privileges, which the CPU has the duty to
enforce

The Interrupt Descriptor Table (IDT)
defines entry-points for the various code-routines that
will handle all ‘interrupts’ and ‘exceptions’

The Task-State Segment (TSS)
holds the values for registers SS and ESP that will get
loaded by the CPU upon entering kernel-mode
How does CPU find GDT/IDT?


Two dedicated registers: GDTR and IDTR
Both have identical 48-bit formats:
Segment Base Address
47
Segment Limit
16 15
0
Kernel must setup these registers during system startup (set-and-forget)
Privileged instructions: LGDT and LIDT used to set these register-values
Unprivileged instructions: SGDT and SIDT used for reading register-values
How does CPU find the TSS?

Dedicated system segment-register TR holds
a descriptor’s offset into the GDT
The kernel must set up the
GDT and TSS structures
and must load the GDTR
and the TR registers
GDT
TSS
TR
The CPU knows the layout
of fields in the Task-State
Segment
GDTR
IDT Initialization

Initialized once by BIOS in real mode


Must not expose kernel to user mode access


Linux re-initializes during kernel init
start by zeroing all descriptors
Linux lingo:



Interrupt gate (same as Intel; no user access)
 Not accessible from user mode
System gate (Intel trap gate; user access)
 Used for int, int3, into, bounds
Trap gate (same as Intel; no user access)
 Used for exceptions
Interrupt Processing

BUILD_IRQ macro generates:


IRQn_interrupt:
 pushl $n-256 // negative to distinguish syscalls
 jmp common_interrupt
Common code:


common_interrupt:
 SAVE_ALL // save a few more registers than hardware
 call do_IRQ
 jmp $ret_from_intr
do_IRQ() is C code that handles all interrupts
Low-level IRQ Processing

do_IRQ():







get vector, index into irq_desc for appropriate struct
grab per-vector spinlock, ack (to PIC) and mask line
set flags (IRQ_PENDING)
really process IRQ? (may be disabled, etc.)
call handle_IRQ_event()
some logic for handling lost IRQs on SMP systems
handle_IRQ_event():


enable interrupts if needed (SA_INTERRUPT clear)
execute all ISRs for this vector:
 action->handler(irq, action->dev_id, regs);
IRQ Data Structures

irq_desc: array of IRQ descriptors




status (flags), lock, depth (for nested disables)
handler: PIC device driver!
action: linked list of irqaction structs (containing ISRs)
irqaction: ISR info


handler: actual ISR!
flags:






SA_INTERRUPT: interrupts disabled if set
SA_SHIRQ: sharing allowed
SA_SAMPLE_RANDOM: input for /dev/random entropy pool
name: for /proc/interrupts
dev_id, next
irq_stat: per-cpu counters (for /proc/interrupts)
Hardware Handling

On entry:




Which vector?
Get corresponding descriptor in IDT
Find specified descriptor in GDT (for handler)
Check privilege levels (CPL, DPL)



Save eflags, cs, (original) eip on stack
Jump to appropriate handler


If entering kernel mode, set kernel stack
Assembly code prepares C stack, calls handler
On return (i.e. iret):



Restore registers from stack
If returning to user mode, restore user stack
Clear segment registers (if privileged selectors)
Interrupt Handling

More complex than exceptions


Requires registry, deferred processing, etc.
Three types of actions:

Critical: Top-half (interrupts disabled – briefly!)


Non-critical: Top-half (interrupts enabled)


Example: acknowledge interrupt
Example: read key scan code, add to buffer
Non-critical deferrable: Do it “later” (interrupts enabled)


Example: copy keyboard buffer to terminal handler process
Softirqs, tasklets
Download