Computer Architecture - Overview User Mode Execution processor architecture

advertisement
User Mode Execution
Computer Architecture - Overview
processor architecture
able to use all of the “normal” instructions
–
privileged execution modes
–
load and store general registers from/to memory
–
asynchronous exceptions (traps)
–
arithmetic, logical, test, compare, data copying
–
branches and subroutine calls
I/O architecture
able to address some subset of memory
–
busses, controllers, devices, smart controllers
–
I/O: direct, polled, mapped, DMA, interrupt driven
–
sequential and random access devices
–
disks and factors affecting disk I/O performance
–
I/O operations, update the MMU
You need to understand how these really work
–
interrupt enables, enter supervisor mode
computer and I/O architecture
–
what is controlled by a Memory Management Unit
not able to perform privileged operations
3/5/03 - 1
computer and I/O architecture
Supervisor Mode Execution
Processor Status Register
can execute privileged instructions
contains condition codes
–
able to perform I/O operations
–
set by arithmetic/logical operations (0,+,-,ovflo)
–
interrupt enable/disable/return, load PS
–
tested by conditional branch instructions
–
instructions to change processor mode
controls execution mode (user/supervisor)
can access privileged address spaces
–
access data structures inside the OS
–
access other process's address spaces
–
change and create address spaces
describes which interrupts are enabled
may describe what address space to use
may control other processor features/options
may have alternate registers, alternate stack
computer and I/O architecture
3/5/03 - 2
3/5/03 - 3
–
word length, endian-ness, instruction set, ...
computer and I/O architecture
3/5/03 - 4
Choice of Execution Modes
Asynchronous Exceptions and Handlers
computer boots up in supervisor mode
–
most errors can be handled “in-line”
used by bootstrap and OS to initialize the system
applications run in user mode
–
OS changes to user mode before running user code
user programs cannot do I/O, restricted address space
–
they have no way to get into supervisor mode
because instructions to change the PS are privileged
reentering supervisor mode is strictly controlled
–
only happens in response to traps and interrupts
computer and I/O architecture
3/5/03 - 5
Trap Handling
st
1 level trap handler
(saves registers and
selects 2nd level handler)
PS/PC
PS/PC
PS/PC
PS/PC
PS/PC
TRAP vector table
return to
user mode
2nd level handler
(actually deals
with the problem)
computer and I/O architecture
–
program can test for, and handle such conditions
some errors must interrupt program execution
–
e.g. CPU was unable to execute this instruction
–
there must be a way to inform OS if this happens
most computers accomplish this with “traps”
–
a well specified list of all possible exceptions
–
a means for the OS to associate handlers with each
computer and I/O architecture
3/5/03 - 6
hardware trap handling
... instr; instr; instr; bad instr; instr; instr; instr ...
supervisor mode
arithmetic overflows are reflected in condition codes
(Transition into Supervisor Mode)
Application Program
user mode
–
3/5/03 - 7
–
use trap cause to index into trap vector table for PC/PS
–
load new processor status word, switch to supv mode
–
push PC/PS of program that caused trap onto stack
–
load new program counter (w/addr of 1st level handler)
software trap handling
–
1st level handler pushes all other registers onto stack
–
1st level handler gathers info, selects 2nd level handler
–
2nd level handler deals with the exception condition
computer and I/O architecture
3/5/03 - 8
Control of Supervisor mode transitions
all user->supervisor changes are via traps/interrupts
–
it is difficult to know when these will happen
there is a designated handler for each trap/intr
–
its address is stored in a trap/interrupt vector table
–
the operating system sets up all of the handler vectors
ordinary programs can't access these vectors
–
vectors are not in the process' address spaces
by carefully controlling all of the trap/intr “gateways”
computer and I/O architecture
some exceptions are handled by the OS
–
e.g. page faults, alignment, floating point emulation
–
OS simulates expected behavior and returns
some exceptions may be fatal to running task
–
e.g. zero divide, illegal instruction, invalid address
–
OS reflects the failure back to the running process
some exceptions may be fatal to the system
the OS controlls all supervisor mode transitions
–
Dealing with the cause of a trap
3/5/03 - 9
–
e.g. power failure, cache parity, stack violation
–
OS cleanly shuts down the affected hardware
computer and I/O architecture
(Returning to User Mode)
Stacking and unstacking a trap
user mode
computation
return is opposite of interrupt/trap entry
supervisor mode stack
user mode stack
growth
user-mode
PC and PS
saved
user-mode
registers
parameters
to 2 nd level
handler
–
2nd level system call handler returns to 1st level handler
–
1st level handler restores all registers from stack
–
use privileged return instruction to restore PC/PS
–
resume user-mode execution after trapped instruction
saved registers can be changed before return
return PC
stack frame
for 2 nd level
handler
...
computer and I/O architecture
3/5/03 - 10
3/5/03 - 11
–
used to set entry point for newly loaded programs
–
used to deliver signals to user-mode processes
–
used to set return codes from system calls
computer and I/O architecture
3/5/03 - 12
Traps while in Supervisor Mode
I/O architectures: busses
nearly identical to traps while in user mode
–
trap saves interrupted PC/PS on supervisor mode stack
–
trap goes to same vector & 1st level handler
–
same register saving, restoring, and return
there are very few differences
control
data
address
interrupts
main bus
–
saved PS at time of interrupt shows supervisor mode
–
2nd level handler knows trap was from supervisor mode
(and may consider it to be more or less severe than the
same trap from user mode)
computer and I/O architecture
Controller
CPU
3/5/03 - 13
Memory
Controller
Device
computer and I/O architecture
Memory type busses
3/5/03 - 14
Network type busses
came from back-plane memory-to-CPU interconnects
evolved as peripheral device interconnects
–
a few “bus masters”, and many “slave devices”
–
SCSI, USB, 1394 (firewire), Infiniband, ...
–
arbitrated multi-cycle bus transactions
–
cables and connectors rather than back-planes
request, grant, address, respond, transfer, ack
–
designed for easy and dynamic extensibility
operations: read, write, read/modify/write, interrupt
–
originally slower than back-plane, but no longer
originally most busses were of this sort
much more similar to a general purpose network
–
ISA, EISA, PCMCIA, PCI, cPCI, video busses, ...
–
distinguished by form-factor, speed, data width, ...
–
newer busses support bridging, hot-swap, self-identifying
computer and I/O architecture
3/5/03 - 15
–
packet switched, topology, routing, node identity
–
may be master/slave (USB) or peer-to-peer (1394)
–
may be implemented by controller or by host
computer and I/O architecture
3/5/03 - 16
I/O architectures: devices & controllers
mechanisms: device controller registers
I/O devices
device controllers export registers to the bus
–
peripheral devices that interface between the computer
and other media (disks, tapes, networks, serial ports,
keyboards, displays, pointing devices, etc.)
device controllers connect a device to a bus
–
communicate control operations to device
–
relay status information back to the bus
–
manage DMA transfers for the device
–
generate interrupts for the device
FER
DCD
PER
RI
reading from registers obtains data/status
may require special instructions (e.g. x86 IN/OUT)
may be mapped onto bus like memory
accessed with normal (load/store) instructions
I/O address space not accessible to most processes
computer and I/O architecture
3/5/03 - 18
(16550 UART registers)
Register
Data Register
Interrupt Enable Register
Interrupt Register
Line Control Register
RTS Modem Control Register
RER Line Status Register
CTS Modem Status Register
A 16550 presents seven 8-bit registers to the bus.
0: data – read received byte, write to transmit a byte
(or LSB of speed divisor when speed set is enabled)
1: interrupt enables – for transmit done, data received, cd/ring
(or MSB of speed divisor when speed set is enabled)
2: interrupt registers – currently pending interrupt conditions
3: line control register – character length, parity and speed
4: modem control register – control signals sent by computer
All communication between the bus and the device (send data,
receive data, status and control) is performed by reading from,
and writing to these registers.
computer and I/O architecture
–
privileged instructions restricted to supervisor mode
A simple device: 16550 UART
BRK
writing into registers controls device or sends data
–
3/5/03 - 17
DTR
OVR
DSR
–
–
computer and I/O architecture
contents
x
x
x
x
x
MDM STS XMT RCV
MDM STS XMT RCV
PARITY
STOP WORDLEN
registers in controller can be addressed from bus
register access method varies with CPU type
a controller is usually specific to a device and a bus
offset
0
x
x
x
1
2
3
spee BRK
d
4
5
RCV EMT XMT
6
–
3/5/03 - 19
5: line status register – xmt/rcv completion and error conditions
6: modem status registers – received modem control signals
computer and I/O architecture
3/5/03 - 20
Scenario: direct I/O with polling
(mechanisms: direct polled I/O)
all transfers happen under direct control of CPU
uart_write_char( char c ) {
while( (inb(UART_LSR) & TR_DONE) == 0);
outb( UART_DATA, c );
}
–
CPU transfers data to/from device controller registers
–
transfers are typically one byte or word at a time
–
may be accomplished with normal or I/O instructions
CPU polls device until it is ready for data transfer
char uart_read_char() {
while( (inb(UART_LSR) & RX_READY) == 0);
return( inb(UART_DATA) );
–
received data is available to be read
–
previously initiated write operations have been completed
advantages
–
}
computer and I/O architecture
3/5/03 - 21
performance of direct I/O
each byte or word transferred requires mutiple instructions
busy-wait polling ties up CPU until I/O is completed
devices are idle while we are running other tasks
–
3/5/03 - 22
bus facilitates data flow in all directions between
CPU is wasted while awaiting completion of transfers
–
computer and I/O architecture
Direct Memory Access – I/O w/o the CPU
CPU intensive data transfers
–
very easy to implement (both hardware and software)
–
CPU, memory, and device controllers
CPU can be the bus-master
–
initiating data transfers with memory or device controllers
device controllers can also master the bus
I/O can only happen when an I/O task is running
–
how can problems be dealt with
CPU instructs controller what transfer is desired
what data to move to/from what part of memory
–
let controller transfer data without attention from CPU
–
device controller performs transfer w/o CPU assistance
–
let application block pending I/O completion
–
device controller generates interrupt at end of transfer
–
let controller interrupt CPU when I/O is finally done
computer and I/O architecture
3/5/03 - 23
computer and I/O architecture
3/5/03 - 24
completion interrupts – waking up CPU
Interrupt Handling
Application Program
device controllers, busses, and interrupts
... instr; instr; instr; instr; instr; instr ...
busses have ability to send interrupts to the CPU
user mode
–
devices signal controller when they are done/ready
supervisor mode
–
when device is done, controller asserts interrupt on bus
CPUs and interrupts
–
1st level
interrupt handler
interrupts look very much like traps
PS/PC
PS/PC
PS/PC
PS/PC
return to
user mode
PS/PC
Interrupt vector table
traps come from CPU, interrupts are caused externally
–
unlike traps, interrupts can be selectively enabled/disabled
2nd level handler
(device driver
interrupt routine)
a device can be told it can or cannot generate interrupts
special instructions can enable/disable interrupts to CPU
computer and I/O architecture
3/5/03 - 25
interrupts vs. traps
–
they are triggered when something happens
–
there is (usually) no persistent state that must be cleared
interrupts are caused a device being in some state
–
they are triggered when the device enters a particular state
–
they will continue to be asserted until device state changes
lock(devlock);
/* lock device */
/* update data read count */
/* program the DMA request */
req_xfr = req_cnt – dp->cnt;
dp->loc = req_loc;
dp->adr = req_adr;
dp->cnt = req_cnt;
dp->op = READ;
/* turn off device ability to interrupt */
dp->ctrl = IDISABLE;
dp->ctrl = IENABLE | GO;
/* wake up the requester */
wakeup(devcompletion);
intr_enable( save );
once delivered, an interrupt must be disabled
await(devcompletion);
/* tell intr dispatcher we're done */
CPU must ignore continuing request for that interrupt
/* request has completed */
cause must be cleared, and interrupt acknowledged
unlock(devlock);
computer and I/O architecture
dev_intr_handler() {
save = intr_enable(DISABLE);
/* re-enable and await completion */
the device is changed from DONE to BUSY again
–
3/5/03 - 26
DMA read w/completion interrupts
traps are caused by an instantaneous condition
–
list of device interrupt handlers
computer and I/O architecture
3/5/03 - 27
/* release device */
computer and I/O architecture
return( ACKNOWLEDGE_INTERUPT)
}
3/5/03 - 28
(device I/O with completion interrupts)
mechanisms: memory mapped I/O
requesting process checks to see if device is busy
DMA may not be the easiest way to do I/O
–
if idle, start the I/O operation, and await its completion
–
if busy, wait for the device to become idle
I/O interrupt handler
–
gathers completion information from the device
–
posts completion awakening requester
wake up the next requester
–
continuous updates to isolated areas of the screen
–
1MB display controller sits on the CPU memory bus
–
each byte of display memory corresponds to one pixel
–
application uses ordinary stores to update display
low overhead per update, no interrupts to service
we'll talk about waiting and waking up in two weeks
computer and I/O architecture
consider a video game display adaptor
implement as a bit-mapped display adaptor
when current device owner finishes using the device
–
–
3/5/03 - 29
relatively easy to program
computer and I/O architecture
trade-off: memory mapped vs. DMA
3/5/03 - 30
Smart Device Controller
DMA performs large transfers efficiently
–
better utilization of both the devices and the CPU
I/O completion interrupts
device doesn't have to wait for CPU to do transfers
–
I/O instructions
but there is considerable per transfer overhead
setting up the operation, processing completion interrupt
memory-mapped I/O has no start/finish overhead
–
device
driver
basic status
basic control
accessed through bus
control registers (on bus)
buffer
pointers
device
controller
but every byte is transferred by a CPU instruction
normal
instructions
DMA better for occasional large transfers
accessed through DMA
memory-mapped better frequent small transfers
memory-mapped devices are more difficult to share
computer and I/O architecture
3/5/03 - 31
shared buffers (in memory)
computer and I/O architecture
3/5/03 - 32
Random v.s. Sequential Access
(I/O Mechanisms: smart controllers)
Smarter controlers can improve on basic DMA
they can queue multiple input/output requests
–
when one finishes, automatically start next one
–
reduce completion/start-up delays
–
eliminate need for CPU to service interrupts
request scheduling to improve perormance
–
they can do automatic error handling & retries
they can better hide the details of underlying devices
computer and I/O architecture
–
byte/block N must be read before byte/block N+1
–
may be read/write once, or may be rewindable
–
examples: magnetic tape, printer, keyboard
Random access devices
they can relieve CPU of other I/O responsibilities
–
Sequential access devices
3/5/03 - 33
–
possible to seek directly to any desired byte/block
–
seeks may or may not be instantaneous
–
examples: memory, magnetic disk, CD, graphics adaptor
They are used very differently
computer and I/O architecture
random access devices: disks
Disk drive geometry
random access devices are much more interesting
–
usage, performance, and scheduling techniques
program loading, file I/O, paging
–
disk performance drives timesharing performance
–
a mounted assembly of circular platters
–
read/write head per surface, all moving in unison
track
–
ring of data readable by one head in one position
cylinder
disk I/O operations are subject to overhead
–
–
higher overhead means fewer operations/second
–
careful scheduling can reduce overhead
–
clever scheduling can improve throughput and delay
computer and I/O architecture
spindle
head assembly
key time sharing services depend on disk I/O
–
3/5/03 - 34
corresponding tracks on all platter
sector
–
3/5/03 - 35
logical records written within tracks
disk address = <cylinder / head / sector >
computer and I/O architecture
3/5/03 - 36
Disk Drive - Logical
Disk Drive – Physical
Sectors
Spindle
Track
10 heads
platter/surface
0
1
5 platters
10 surfaces
head
positioning
assembly
8
9
Cylinder
Motor
computer and I/O architecture
3/5/03 - 37
computer and I/O architecture
Optimizing disk performance
Disk Drive Performance
heads
10
platters
cylinders
17,000
tracks/inch
sectors/track
400
bytes/sector
RPM
7200
speed
seek time
2-15ms (average 9ms)
latency
0-8ms (average 4ms)
best case
worst case
average
don't start I/O until disk is on-cyl/near sector
5
18,000
512
200Mb/sec
–
I/O ties up the controller, locking out other operations
–
other drives seek while one drive is doing I/O
minimize head motion
time to read one 8,000 byte block
seek
rotate
transfer
total
400 s
0ms
0ms
400 s
23.4ms (58X)
15ms
8ms
400 s
13.4ms (33X)
9ms
4ms
400 s
computer and I/O architecture
3/5/03 - 38
–
do all possible reads in current cylinder before moving
–
make minimum number of trips in small increments
encourage efficient data requests
3/5/03 - 39
–
have lots of requests to choose from
–
encourage cylinder locality
–
encourage largest possible block sizes
computer and I/O architecture
3/5/03 - 40
Head Travel under various algorithms
read sections 6-6.3
76
First Come First Served
124 17
269 201 29
137
12
48
107
252
125
Tot=880
29
Shortest Seek First
17
12
124 137
68
172
108
For the next lecture
(see Greek to English dictionary regarding figure 6-3)
there will be a quiz on the reading
76
47
12
5
112
13
64
201
68
topics for the next lecture
269
Tot=321
76
Scan/look (elevator algorithm)
124 137 201 269 29
17
12
48
13
5
Tot=450
64
68
240
12
computer and I/O architecture
3/5/03 - 41
key points
user view of processes
–
process address spaces
–
object modules, load modules, linkage editing
–
procedure calls, stack frames, system calls, signals
computer and I/O architecture
trap and interrupt handling
channels sit between CPU and I/O devices
–
save/restore, vectoring 1st and 2nd level handlers
think of them as extremely smart busses
the include highly specialized CPUs
busses, devices, controllers, interconnections
–
they execute channel I/O programs
I/O mechanisms, what they are, how they work
–
instructions to read, write and control devices
–
instructions to generate progress interrupts
–
polled I/O, direct I/O, memory mapped I/O, DMA
–
interrupt driven I/O, smart controllers
once started, I/O programs execute w/o CPU attention
random access devices
–
disk geometry, disk performance, disk scheduling
computer and I/O architecture
3/5/03 - 42
Channel Controllers – I/O co-processors
supervisor mode execution, privileged instructions
–
–
3/5/03 - 43
–
command chaining
–
data chaining
computer and I/O architecture
3/5/03 - 44
Typical Channel Architecture
Typical Channel Program
(both programs located in main memory)
Device
Controller
0x11?
CPU
Main bus
Channel
Controller
0x1??
Channel
Controller
0x2??
...
Device
Controller
0x1F?
Device
0x110
...
Device
0x11F
SIO 0x101, iopgm
...
...
...
3/5/03 - 45
Channel Controller
iopgm SEEK cyl=1020, hd=5, rec=10
READ buf=xxx, cnt=4096
READX buf=yyy, cnt=4096, intr
TIC
next
next
intr: TIO 0x101
all channels, controllers and devices have "Geographic" addresses
computer and I/O architecture
Main CPU
...
computer and I/O architecture
SEEK cyl=1050, hd=0, rec=2
WRITE buf=zzz, cnt=8192, intr
END
intr
(note, channel can concurrently execute
one program per controller)
3/5/03 - 46
Download