CPSC 3300 – Spring 2015 – Exam 2 Name: ______________________ 1. Matching. Write the correct term from the list into each blank. (2 pts. each) control signal control store hardwired microprogrammed structural hazard data hazard control hazard forwarding (a) _____________________ when hardware cannot support the combination of instructions we want to execute in the same clock cycle (b) _____________________ providing a data value to any unit where it is needed after the data value has been produced but before it is available in the register file (c) _____________________ value used for selecting a mux input or selecting the operation of a functional unit True/False. Circle T or F. (2 pts. each) 2. T / F A microprogrammed control unit is typically faster than a hardwired control unit. 3. T / F Predict-untaken is easier to implement than predict-taken since a branch target address is not needed. 4. T / F Branch prediction combined with speculative execution requires some form of branch misprediction recovery hardware, such as a branch history shift register (BHSR). 5. T / F In a VLIW compute system, dependency checking between instructions is done by the hardware. ❶ ❷ ❸ ❸ 6. Consider the MIPS “subtract” instruction as implemented on the single-cycle datapath above (Figure 4.2 from textbook): subtract R3, R1, R2 // Reg[3] <- Reg[1] - Reg[2] Circle the correct value 0 or 1 for the control signals (a-d) and circle whether each of the three muxes (e-g) selects its upper input, lower input, or don't care. For the ALU operation (h) circle one of the function names. (The Zero condition signal will be assumed to be 0.) (8 pts.) (a) Branch (b) MemRead (c) MemWrite (d) RegWrite = = = = 0 0 0 0 1 1 1 1 (e) Mux1 (upper left; output to PC) = upper, lower, don't care (f) Mux2 (upper middle; output to Data port of Regs) = upper, lower, don't care (g) Mux3 (lower middle; output to bottom leg of ALU) = upper, lower, don't care (h) ALU operation = and, or, add, subtract, set-on-less-than, nor 7. Briefly explain what each stage does when the “subtract R3,R1,R2” instruction is executed by the five-stage pipeline. (15 pts.) 8. The branch CPI penalty is calculated as extra CPI = (branch freq.)*(misprediction freq.)*(mispredict penalty). Consider a five stage pipeline with static predict-untaken and where the branch target address and the branch direction are resolved at the end of the EX stage. What is the mispredict penalty? (5 pts.) 9. Consider using a two-bit saturating counter for branch prediction. Assume the state is initialized to binary 00. Each taken branch (T) increments the counter unless the state is already binary 11. Each untaken branch (U) decrements the counter unless the state is already binary 00. What is the state of the predictor after the branch sequence “T T T T U”? (4 pts.) 10. For a loop branch with the sequence of “T T T T U T T T T U”, give the accuracy of the following predictors. (a) static predict taken (2 pts.) (b) static predict untaken (2 pts.) (c) dynamic predict using a one-bit history initialized to 0 (untaken = 0, taken = 1) (4 pts.) (d) dynamic predict using a 2-bit saturating counter (see 9 above) initialized to 00. States 00 and 01 predict untaken, and state 10 and 11 predict taken. (6 pts.) 11. Draw the data dependency diagram for the register data flow in the following MIPS code. Destination registers are listed first. (16 pts.) add r3, r1, r2 lw r4, 8(r3) sub r6, r4, r5 xor r7, r4, r6 // r3 <- r1 + r2 // r4 <- memory[r3+8] // r6 <- r4 - r5 // r7 <- r4 ^ r6 12. For the MIPS instruction sequence given in question 11, show the pipeline cycle (“staircase”) diagram for the standard 5-stage pipeline with forwarding. (9 pts.) add r3, r1, r2 lw r4, 8(r3) sub r6, r4, r5 xor r7, r4, r6 13. Consider the following datapath. (Assume all registers are edge-triggered and thus immune from races.) Control signal identifiers are given for the in and out control points of the registers. Additional control signals include memory signals Mem, R (read), W (write), and 3-bit ALU function field F. ALU functions (three-bit F field) --------------------------------000: C = A + B 100: C = A - B 001: C = A 101: C = not A 010: C = A + 1 110: C = A - 1 011: C = A << 1 111: C = A >> 1 Complete the step-by-step RTL and the control signal sequence to fetch and execute an add instruction “add X”. Assume that the instruction is composed of two memory words: a one-word opcode followed by a one-word address. Assume also that the address of the instruction is in the PC, and that the memory memory is word-addressable. The actions of the instruction are ACC <- ACC + memory[X], for the memory address X given in the second word of the instruction. (15 pts.) // fetch opcode and place in IR MAR <- PC PC <- PC + 1 MBR <- memory[MAR] IR <- MBR // control signals 5 (A=PC), F=001 (C=A), 5 (A=PC), F=010 (C=A+1), Mem, R 1 (A=MBR), F=001 (C=A), 10 (MAR=C) 11 (PC=C) 13 (IR=C) XC. For the datapath depicted below, assume the following latencies (delays): 1 nsec to select and transfer a register value across the bus 2 nsec setup and hold time for temporary register W, Y, or Z (i.e., write into) 3 nsec setup and hold time for register file R0-R3 (i.e., write into) 4 nsec incrementer 6 nsec ALU operation +------+ .-. +-------------+ | R0 |<-->| |-->| incrementer |--. +------+ | | +-------------+ | | R1 |<-->| | v +------+ | | +-------+ | R2 |<-->| |<---------------| W | +------+ | | +-------+ | R3 |<-->| | +------+ | |--------------------. | | | | | +-------+ | | |-->| Y | | | | +-------+ | | | v v | | --------| | \ \______/ / | | \ / | | \ ALU / | | \__________/ bus | | v | | +-------+ | |<---------| Z | `-' +-------+ There is one datapath action per clock cycle, according to these rules: (1) a datapath action starts with a register value being placed on the bus and ends with a register being written (2) the action may be merely transferring data from one register to another, or there may be a computation performed in between the register accesses (e.g., increment or addition) (3) the Y register cannot be used to pass through a value from the bus to the ALU in the same cycle (i.e., it cannot be written and re-read in the same cycle) (4) only one value may be placed on the bus in a given cycle For example, one datapath action is W<-R0+1 and the path is “R0 to bus to incrementer to W”. There is no extra cost to read from a register so the path delay is 1 + 4 + 2 = 7 nsec, where the bus select and transfer takes 1 nsec, the incrementer takes 4 nsec, and writing into temporary register W takes 2 nsec. Consider the paths and path delays across the range of possible datapath actions. What is the critical path that limits the clock frequency? (5 pts.)