CSE 331 Computer Organization and Design Fall 2007 Week 13 Section 1: Mary Jane Irwin (www.cse.psu.edu/~mji) Section 2: Krishna Narayanan Course material on ANGEL: cms.psu.edu [adapted from D. Patterson slides] CSE331 W13.1 Irwini Fall 2007 PSU Head’s Up Last Multicycle MIPS datapath and control path implementation, microprogramming This week’s material week’s material Input/Output – dealing with exceptions and interrupts - Reading assignment – PH: 5.6, 8.1, 8.5, A.7-A.8 Next week’s material Intro to pipelined datapath design - Reading assignment – PH: 6.1 Reminders CSE331 W13.2 HW 7 is due Monday, Dec 3rd (by 11:55pm) Quiz 7 closes Tues., Dec 4th (by 11:55pm) Final Exam is Tues., Dec 18th, 10:10 to noon, 112 Walker Dec. 12th deadline for filing grade corrections/updates Irwini Fall 2007 PSU Well, surely a higher college GPA ought to be correlated with career success … the data didn’t bear this out. … fast-trackers have identifiable qualities and show definite trends. One quality is a whatever-it-takes attitude … a willingness to do whatever it takes to make a project succeed … If that means working weekends … they do it. Another quality is a solid, unflappable understanding of all the technologies they are using or developing. … Finally, these high-output types seemed to innately grasp that they are members of a large team … they are the ones always helping everyone else. The Pentium Chronicles, Colwell, pg. 140 CSE331 W13.3 Irwini Fall 2007 PSU Major Components of a Computer Processor Control Devices Memory Output Datapath Important CSE331 W13.4 Input metrics for an I/O system Performance Compatibility Expandability and diversity Dependability Cost, size, weight Irwini Fall 2007 PSU Input and Output Devices I/O devices are incredibly diverse with respect to Behavior – input, output or storage Partner – human or machine Data rate – the peak rate at which data can be transferred between the I/O device and the main memory or processor Magnetic disk Graphics display CSE331 W13.5 Behavior input input output input or output storage Partner human human human machine Data rate (Mb/s) 0.0001 0.0038 3.2000 100.0000-1000.0000 machine 240.0000-2560.0000 output human 800.0000-8000.0000 8 orders of magnitude range Device Keyboard Mouse Laser printer Network/LAN Irwini Fall 2007 PSU Input/Output in SPIM via System Calls SPIM provides a small set of operating-system-like services through the syscall instruction Load the system call code into register $v0 and the arguments into registers $a0 trhough $a3 Return values are put in register $v0 Service Code Args print_int $a0 = integer 1 print_string $a0 = string 4 read_int 5 read_string $a0 = buffer, 8 $a1 = length print_char 11 $a0 = char read_char 12 CSE331 W13.6 Results integer in $v0 char in $a0 ?$v0? Irwini Fall 2007 PSU Communication of I/O Devices and Processor How the processor directs the I/O devices Special I/O instructions - Must specify both the device and the command Memory-mapped I/O - Portions of the high-order memory address space are assigned to each I/O device. Read (lw) and writes (sw) to those memory addresses are interpreted as commands to the I/O devices - Load/stores to the I/O address space done only by the OS How the I/O device communicates with the processor Polling – the processor periodically checks the status of an I/O device to determine its need for service - Processor is totally in control – but does all the work - Can waste a lot of processor time due to speed differences Interrupt-driven – the I/O device issues an interrupts to the processor to indicate that it needs attention CSE331 W13.7 Irwini Fall 2007 PSU “Real” I/O in SPIM supports one memory-mapped I/O device – a terminal with two independent units SPIM Transmitter writes characters to the display Receiver reads characters from the keyboard Processor Devices Control Transmitter Memory Datapath CSE331 W13.8 Receiver Irwini Fall 2007 PSU Review: MIPS (spim) Memory Allocation Memory Mem Map I/O $sp fffffffc f f f f 0000 Kernel Code & Data 8000 0080 7f f e f f fc Stack 230 words Dynamic data $gp Static data 1000 8000 ( 1004 0000) 1000 0000 User Code PC CSE331 W13.9 0040 0000 Reserved 0000 0000 Irwini Fall 2007 PSU Terminal Receiver (Input) Control with SPIM Input is controlled via two memory-mapped device registers (i.e., each is a special memory location) 7 Receiver data (0xffff0004) 0 unused received byte from keyboard (read only) 1 0 Receiver control (0xffff0000) unused interrupt enable ready (read only) The keyboard inputs into the Receiver data register which sets the ready bit in the Receiver control register (i.e., the keyboard input is ready to be read by the program) Reading the next input character from the Receiver data register resets the ready bit in the Receiver control register CSE331 W13.10 Irwini Fall 2007 PSU Terminal Output Control with SPIM Output is controlled via two memory-mapped device registers (i.e., each is a special memory location) 7 Transmitter data (0xffff000c) 0 unused transmitted byte to display 1 0 Transmitter control (0xffff0008) unused interrupt enable ready (read only) The display outputs the Transmitter data register character which sets the ready bit in the Transmitter control register (i.e., the display is ready to accept a new output character) Writing the next character to output into the Transmitter data register resets the ready bit in the Transmitter control register CSE331 W13.11 Irwini Fall 2007 PSU MIPS I/O Instructions MIPS has 2 coprocessors: Coprocessor 0 handles exceptions including input and output interrupts, Coprocessor 1 handles floating point Coprocessors have their own register sets so have instructions to move values between these registers and the CPU’s registers mfc0 rd, rt 0x10 0 mtc0 rt, rd 0x10 CSE331 W13.12 Register # Use BadVAddr 8 bad mem addr Count 9 timer Compare 11 timer compare Status 12 intr mask & enable bits Cause 13 excp type and pending intr’s EPC 14 addr of instr causing excp #move from coprocessor 0 rt rd 0 0 #move to coprocessor 0 4 rt rd 0 0 Irwini Fall 2007 PSU Polling in SPIM Be sure that memory-mapped I/O is enabled (through the PCSpim “Settings” dialog box) I1: I2: CSE331 W13.13 li li li li $t0, $t1, $t2, $t3, 0xffff0000 0xffff0004 0xffff0008 0xffff000c #recv ctrl #recv data #trans ctrl #trans data mtc0 $zero, $12 #disable interrupts lw andi beq lw $t4, $t4, $t4, $t6, 0($t0) $t4, 1 $zero, Il 0($t1) #poll recv ready bit lw andi beq sw $t4, $t4, $t4, $t6, 0($t2) $t4, 1 zero, I2 0($t3) #poll trans ready bit #loop til recv ready #read input character #loop til trans ready #echo (print) character Irwini Fall 2007 PSU The Downsides of Polling Input and output devices are very slow compared to the processor These time lags are simulated in SPIM which measures time in instructions executed, not in real clock time After the transmitter starts to write a character, the transmitter’s ready bit becomes 0. It doesn’t become ready again until the processor has executed a (large) fixed number of instructions. (You don’t want to single step the simulator!) Polling will execute the “loop til ready” code thousands of times. While the input or output is occurring, nothing else can be done – a waste of resources. There is a better way CSE331 W13.14 Irwini Fall 2007 PSU I/O Interrupts An I/O interrupt is used to signal an I/O request for service Can have different urgencies (so may need to be prioritized) Need to identity the device generating the interrupt An I/O interrupt is asynchronous wrt instr execution An I/O interrupt is not associated with any instruction and does not prevent any instruction from completion - You can pick your own convenient point to take an interrupt Advantage User program progress is only halted during the actual transfer of I/O data to/from user memory space Disadvantage – special hardware is needed to Cause an interrupt (I/O device) Detect an interrupt and save the proper information to resume after servicing the interrupt (processor) CSE331 W13.15 Irwini Fall 2007 PSU Interrupt Driven Input Processor 1. input interrupt 2.1 save PC Memory add sub and or beq user program Receiver Keyboard 2.2 jump to interrupt service routine 2.4 return to user code lbu sb ... jr 2.3 service interrupt input interrupt service routine memory CSE331 W13.17 Irwini Fall 2007 PSU Interrupt Driven Input in SPIM 1. the Receiver indicates with an interrupt that it has input a new character from the keyboard into the received byte Receiver data register Receiver data (0xffff0004) - unused 65 writing to the Receiver data register sets the Receiver control register ready bit to 1 Receiver control (0xffff0000) unused 1 10 interrupt enable 2. ready the user process responds to the interrupt by transferring control to an interrupt service routine that copies the input character into the user memory space - CSE331 W13.18 reading the Receiver data register resets the Receiver control register ready bit to 0 Irwini Fall 2007 PSU Interrupt Driven Output 1.output interrupt Processor 2.1 save PC Memory Trnsmttr 2.2 jump to interrupt service routine Display 2.4 return to user code add sub and or beq lbu sb ... jr user program 2.3 service interrupt output interrupt service routine memory CSE331 W13.20 Irwini Fall 2007 PSU Interrupt Driven Output in SPIM 1. the transmitter indicates with an interrupt that it has successfully output the character in the Transmitter data register in memory to the display transmitted byte Transmitter data (0xffff000c) - unused 65 reading from the Transmitter data register sets the Transmitter control register ready bit to 1 Transmitter control (0xffff0008) unused 1 10 interrupt enable 2. ready the user process responds to the interrupt by transferring control to an interrupt service routine that writes the next character to output from the user memory space into the Transmitter data register - CSE331 W13.21 writing to the Transmitter data register resets the Transmitter control register ready bit to 0 Irwini Fall 2007 PSU Additions to MIPS ISA for I/O Coprocessor 0 records the information the software needs to handle exceptions (including interrupts) EPC (register 14) – holds the address+4 of the instruction that was executing when the exception occurred Status (register 12) – exception mask and enable bits 15 8 4 1 0 Intr Mask User mode Intr enable Excp level - Intr Mask = 1 bit for each of 6 hw and 2 sw exception levels (1 enables exception at that level, 0 disables them) - User mode = 0 if running in kernel mode when exception occurred; 1 if running in user mode (fixed at 1 in SPIM) - Excp level = set to 1 (disable exceptions) when an exception occurs; should be reset by exception handler when done - Intr enable = 1 if exception are enabled; 0 if disabled CSE331 W13.22 Irwini Fall 2007 PSU Additions to MIPS ISA, Con’t Cause (register 13) – exception pending and type bits 31 Branch delay 15 8 Pending exception (PI) 6 2 Exception code PI3 = recv intr PI2 = trans intr - PI: bits set if exception occurs but not yet serviced – so can handle more than one exception occurring at same time, or records exception requests when exception are disabled - Exception code: encodes reasons for exception CSE331 W13.23 – – – – – – – – – 0 (INT) external interrupt (I/O device request) 4 (AdEL) address error trap (load or instr fetch) 5 (AdES) address error trap (store) 6 (IBE) bus error on instruction fetch trap 7 (DBE) bus error on data load or store trap 8 (Sys) syscall trap 9 (Bp) breakpoint trap 10 (RI) reserved (or undefined) instruction trap 12 (Ov) arithmetic overflow trap Irwini Fall 2007 PSU MIPS Exception Return Instruction return – sets the Excp level bit in coprocessor 0’s Status register to 0 (reenabling exception) and returns to the instruction pointed to by coprocessor 0’s EPC register Exception eret 0x10 CSE331 W13.24 #return from exception 1 0 0 0 0x18 Irwini Fall 2007 PSU Example I/O Interrupts in SPIM - Enable li li li li $t0, $t1, $t2, $t3, 0xffff0000 0xffff0004 0xffff0008 0xffff000c #recv ctrl #recv data #trans ctrl #trans data mfc0 andi mtc0 $t4, $13 $t4, $t4, 0xffff00ff#clear Pending interrupt $t4, $13 #(PI) bits in Cause reg li sw sw $t4, 0x2 $t4, 0($t0) $t4, 0($t2) #enable recv interrupts #enable trans interrupts mfc0 ori mtc0 $t4, $12 $t4, $t4, 0xff01 $t4, $12 #enable intr and mask #in Status reg #do something useful while I/O is taking place #when I/O interrupts occur transfer control to #exception handler (at address 0x80000180) CSE331 W13.25 Irwini Fall 2007 PSU Example I/O Interrupts in SPIM - Handler .ktext mfc0 srl andi bne ck_recv: andi beq I1: CSE331 W13.26 lw andi beq lw andi mtc0 0x80000180 $t4, $13 $t5, $t4, 2 $t5, $t5, 0x1f $t5, $zero, excp #get ExcpCode from Cause #ExcpCode in $t5, if 0 #then I/O intr has occurred $t5, $t4, 0x800 #check for PI3 (input), $t5, $zero, ck_trans#if 0,then trans intr $t5, $t5, $t5, $t6, $t4, $t4, 0($t0) #check recv ready $t5, 1 $zero, no_recv_ready 0($t1) #input character into $t6 $t4, 0xfffff7ff#clear PI3 bit in Cause reg $13 Irwini Fall 2007 PSU Example I/O Interrupts in SPIM – Handler, con’t ck_trans: beq andi beq I2: lw andi beq sw mfc0 andi mtc0 ret_hand: mfc0 ori mtc0 eret CSE331 W13.27 $t1, $zero, ret_hand#no character to echo yet $t5, $t4, 0x400 #check for PI2 (output) $t5, $zero, ret_hand#if 0, then no trans intr $t5, $t5, $t5, $t6, $t4, $t4, $t4, 0($t2) #check trans ready $t5, 1 $zero, no_trans_ready 0($t3) #echo character to display $13 $t4, 0xfffffbff#clear PI2 bit in Cause reg $13 $t4, $12 $t4, $t4, 0xff01 $t4, $12 #enable intr and mask #in Status reg #return from intr Irwini Fall 2007 PSU “… designed the P6 frontside bus to be transactionoriented. Chips that connected to the bus were known as bus agents … This transaction orientation is [now] a standard feature of most modern microprocessor buses, despite its complexity and the implication that all bus agents must continuously monitor the bus and track the overall state … it’s a good trade-off between the expense (wires, motherboard routing, and CPU package pins) and inexpensive (transistors on the CPU) The Pentium Chronicles, Colwell, pg. 76 CSE331 W13.29 Irwini Fall 2007 PSU Exceptions in General user program normal control flow: sequential, jumps, branches, calls, returns Exception Exception System Exception Handler return from exception = unprogrammed control transfer system takes action to handle the exception - must record the address of the offending or next to execute instruction and save (and restore) user state CSE331 W13.30 returns control to user after handling the exception Irwini Fall 2007 PSU Two Types of Exceptions Interrupts caused by external events (i.e., request from I/O device) asynchronous to program execution may be handled between instructions simply suspend and resume user program Traps caused by internal events - exceptional conditions (e.g., arithmetic overflow, undefined instr.) - errors (e.g., hardware malfunction, memory parity error) - faults (e.g., non-resident page – page fault) synchronous to program execution condition must be remedied by the trap handler instruction may be retried (or simulated) and program continued or program may be aborted CSE331 W13.31 Irwini Fall 2007 PSU Additions to MIPS ISA for Interrupts Control signals to write EPC (EPCWrite), Cause and Status (Cause&StatusWrite) Hardware to record the type of interrupt in Cause Modify the finite state machine so that the address of interrupt handler (8000 0180hex) can be loaded into the PC, so must increase the size of PC mux and save the address of the next instr in EPC CSE331 W13.32 Irwini Fall 2007 PSU Interrupt Modified Multicycle Datapath Interrupt EPCWrite Memory Address Read Data (Instr. or Data) 1 1 Write Data 0 MDR Write Data Data 2 Shift left 2 EPC 3 2 0 1 0 1 zero ALU 4 0 Instr[15-0] Sign Extend 32 Instr[5-0] CSE331 W13.34 Shift left 2 Instr[25-0] Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read 1 8000 0180 PC[31-28] ALUout 0 IR PC Instr[31-26] Cause Status MemRead MemWrite MemtoReg IRWrite PCSource ALUOp Control ALUSrcB ALUSrcA RegWrite RegDst A IorD Cause&StatusWrite B PCWriteCond PCWrite 0 1 2 3 ALU control Irwini Fall 2007 PSU Interrupt Modified FSM 0 Start 2 6 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 3 Execute Memory Access MemRead IorD = 1 5 MemWrite IorD = 1 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 8 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 Decode 1 IorD = 0 Instr Fetch MemRead;IRWrite ALUSrcA = 0 ALUsrcB = 01 ALUOp = 00 PCSource = 00 PCWrite 9 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCSource = 01 PCWriteCond PCSource = 10 PCWrite 7 RegDst = 1 RegWrite MemtoReg = 0 4 11 RegDst = 0 RegWrite MemtoReg = 1 Cause&StatusWrite EPCWrite;PCWrite IntrOrExcp = 0 PCSource = 11 CSE331 W13.36 Write Back Interrupt pending? Irwini Fall 2007 PSU Additions to MIPS ISA for Traps Control signals to write EPC (EPCWrite & IntrOrExcp), Cause and Status (Cause&StatusWrite) Hardware to record the type of trap in Cause Further modify the finite state machine so that for traps, record the address of the current (offending) instruction in the EPC, so must undo the PC = PC + 4 done during fetch CSE331 W13.37 Irwini Fall 2007 PSU Trap Modified Multicycle Datapath Trap EPCWrite Address Read Data (Instr. or Data) 1 1 Write Data 0 MDR Write Data Data 2 Shift left 2 Shift left 2 EPC 3 2 0 1 0 1 zero ALU 4 0 Instr[15-0] Sign Extend 32 Instr[5-0] CSE331 W13.39 8000 0180 PC[31-28] Instr[25-0] Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read 1 1 0 1 2 3 ALUout Memory IR PC Instr[31-26] 0 0 Cause Status MemRead MemWrite MemtoReg IRWrite PCSource ALUOp Control ALUSrcB ALUSrcA RegWrite RegDst A IorD IntrOrExcp CauseWrite B PCWriteCond PCWrite 01 ALU control Irwini Fall 2007 PSU How Control Detects Two Traps instruction (RI) – detected when no next state is defined in state 1 (decode) for the opcode value Undefined Define the next state value for all undefined op values as new state 10 overflow (Ov) – The overflow signal from the ALU is used in state 6 (if don’t want to complete RegWrite) Need to modify the FSM in a similar fashion for remaining traps Arithmetic Challenge is to handle the interactions between instructions and exception-causing events so that the control logic remains small and fast - Complex interactions makes the control unit the most challenging aspect of hardware design, especially in pipelined processors CSE331 W13.40 Irwini Fall 2007 PSU Trap Modified FSM 0 Start 2 6 ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 3 IorD = 0 Instr Fetch MemRead;IRWrite ALUSrcA = 0 ALUsrcB = 01 ALUOp = 00 PCSource = 00 PCWrite MemRead IorD = 1 CSE331 W13.42 5 MemWrite IorD = 1 4 RegDst = 0 RegWrite MemtoReg = 1 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 Execute Memory Access Write Back 8 Decode 1 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 9 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCSource = 01 PCWriteCond PCSource = 10 PCWrite No Overflow 7 RegDst = 1 RegWrite MemtoReg = 0 10 Cause&StatusWrite ALUSrcA =0 Overflow ALUSrcB = 01 ALUOp = 01 EPCWrite;PCWrite IntrOrExcp = 1 11 InterruptPCSource = 11 pending? Cause&StatusWrite EPCWrite;PCWrite IntrOrExcp = 0 PCSource = 11 Irwini Fall 2007 PSU I/O System Interconnect Issues interrupt signals Processor Cache Memory bus Memory - I/O Bus Main Memory I/O Controller Disk Disk I/O Controller Terminal I/O Controller Network Usually have more than one I/O device in the system connected to the processor via a bus CSE331 W13.44 each I/O device is controlled by an I/O Controller Irwini Fall 2007 PSU Buses A bus is a shared communication link (a single set of wires used to connect multiple subsystems) that needs to support a range of devices with widely varying latencies and data transfer rates Advantages - Versatile – new devices can be added easily and can be moved between computer systems that use the same bus standard - Low cost – a single set of wires is shared in multiple ways Disadvantages - Creates a communication bottleneck – bus bandwidth limits the maximum I/O throughput The maximum bus speed is largely limited by The length of the bus The number of devices on the bus CSE331 W13.45 Irwini Fall 2007 PSU I/O Performance Measures I/O bandwidth (throughput) – amount of information that can be input (output) and communicated across an interconnect (e.g., a bus) to the processor/memory (I/O device) per unit time 1. 2. I/O response time (latency) – the total elapsed time to accomplish an input or output operation How much data can we move through the system in a certain time? How many I/O operations can we do per unit time? An especially important performance metric in real-time systems Many applications require both high throughput and short response times CSE331 W13.46 Irwini Fall 2007 PSU Types of Buses Processor-memory bus (proprietary) Short and high speed Matched to the memory system to maximize the memoryprocessor bandwidth Optimized for cache block transfers Backplane bus (industry standard, e.g., ATA, PCIexpress) I/O The backplane is an interconnection structure within the chassis Used as an intermediary bus connecting I/O busses to the processor-memory bus bus (industry standard, e.g., SCSI, USB, Firewire) Usually is lengthy and slower Needs to accommodate a wide range of I/O devices Connects to the processor-memory bus or backplane bus CSE331 W13.47 Irwini Fall 2007 PSU Example: The Pentium 4’s Buses Memory Controller Hub (“Northbridge”) Graphics output: 2.0 GB/s Gbit ethernet: 0.266 GB/s 2 serial ATAs: 150 MB/s 2 parallel ATA: 100 MB/s System Bus (“Front Side Bus”): 64b x 800 MHz (6.4GB/s), 533 MHz, or 400 MHz DDR SDRAM Main Memory Hub Bus: 8b x 266 MHz PCI: 32b x 33 MHz 8 USBs: 60 MB/s I/O Controller Hub (“Southbridge”) CSE331 W13.48 Irwini Fall 2007 PSU Bus Bandwidth Determinates The bandwidth of a bus is determined by Whether its is synchronous or asynchronous and the timing characteristics of the protocol used The bus width (i.e., number of data lines) Whether the bus supports block transfers or only word at a time transfers Firewire Type Data lines Clocking Max # devices Max length Peak bandwidth CSE331 W13.49 I/O 4 Asynchronous 63 4.5 meters 50 MB/s (400 Mbps) 100 MB/s (800 Mbps) USB 2.0 I/O 2 Synchronous 127 5 meters 0.2 MB/s (low) 1.5 MB/s (full) 60 MB/s (high) Irwini Fall 2007 PSU Buses in Transition Companies are transitioning from synchronous, parallel, wide buses to asynchronous narrow buses Reflection on wires and clock skew makes it difficult to use 16 to 64 parallel wires running at a high clock rate (e.g., ~400 MHz) so companies are transitioning to buses with a few one-way wires running at a very high “clock” rate (~2 GHz) Total # wires # data wires Clock (MHz) Peak BW (MB/s) CSE331 W13.50 PCI 120 32 – 64 (2-way) 33 – 133 128 – 1064 PCIexpress 36 2x4 (1-way) 635 300 ATA 80 16 (2-way) 50 100 Serial ATA 7 2x2 (1-way) 150 375 (3 Gbps) Irwini Fall 2007 PSU ATA Cable Sizes Serial ATA cables (red) are much thinner than parallel ATA cables (green) CSE331 W13.51 Irwini Fall 2007 PSU