CPSC 3300 – Fall 2014 – Exam 2 Name: ______________________ 1. Matching. Write the correct term from the list into each blank. (2 pts. each) control signal control store hardwired microprogrammed structural hazard data hazard control hazard forwarding (a) _____________________ pipeline stall resulting from a branch instruction (b) _____________________ generation of control signals from random logic or a PLA (c) _____________________ generation of control signals from microinstructions fetched from a control store True/False. Circle T or F. (2 pts. each) 2. 3. 4. 5. T T T T / / / / F F F F A microprogrammed control unit is typically faster than a hardwired control unit. A microinstruction has a separate bit for each possible control signal. Predict-taken scheme is better to implement since most branches are taken. Predict-taken is easier to implement than predict-untaken since a branch target address is not needed. ❶ ❷ ❸ ❸ 6. Consider the MIPS “and” instruction as implemented on the datapath above (Figure 4.2 from textbook): and R3, R1, R2 // Reg[3] <- Reg[1] & Reg[2] Circle the correct value 0 or 1 for the control signals (a-d) and circle whether each of the three muxes (e-g) selects its upper input, lower input, or don't care. For the ALU operation (h) circle one of the function names. (The Zero condition signal will be assumed to be 0.) (12 pts.) (a) Branch (b) MemRead (c) MemWrite (d) RegWrite = = = = 0 0 0 0 1 1 1 1 (e) Mux1 (upper left; output to PC) = upper, lower, don't care (f) Mux2 (upper middle; output to Data port of Regs) = upper, lower, don't care (g) Mux3 (lower middle; output to bottom leg of ALU) = upper, lower, don't care (h) ALU operation = and, or, add, subtract, set-on-less-than, nor ❶ ❷ ❸ ❸ IF/ID.Offset 7. Briefly explain what each stage does when the “and R3,R1,R2” is executed by the five-stage pipeline. You can refer to the diagram above (Figure 4.60 from textbook with offset path added). (10 pts.) 8. Consider the two muxes in the diagram above that provide inputs to the legs of the ALU. If the instruction add R3,R1,R2 is in the MEM stage and the instruction sub R5,R3,R4 is in the EX stage, circle which input is selected for each mux. For the subtract, R3 is associated with mux1 and R4 with mux2. (2 pts. each) (a) Mux1 (upper middle; output to top leg of ALU) (b) Mux2 (middle; output to bottom leg of ALU) = upper, middle, lower, don't care = upper, upper middle, lower middle, lower, don't care 9. Consider the mux on the far right in the diagram above. When the instruction lw R5,20(R4) gets to the WB stage, circle which input is selected. (2 pts.) Mux3 (far right; output to Registers) = upper, lower, don't care Associate each term or statement below with a type of dependency. Circle one or more of RAW, WAR, or WAW. (Destination registers are listed first for add and subtract instructions.) (3 pts. each) 10. 11. 12. 13. RAW RAW RAW RAW / / / / WAR WAR WAR WAR / / / / WAW WAW WAW WAW True data dependency False data dependency add R3,R1,R2 followed by sub R1,R3,R4 add R3,R1,R2 followed by sub R5,R3,R4 14. Draw the dependency diagram for the following MIPS code. Destination registers are listed first except for the sw (store word) instructions; sw writes into memory rather than a register. (12 pts.) lw sw add sw add R3, 0(R1) R3, 0(R2) R5, R3, R4 R5, 4(R2) R1, R1, R6 // R3 <- memory[R1+0] // memory[R2+0] <- R3 // R5 <- R3 + R4 // memory[R2+4] <- R5 // R1 <- R1 + R6 15. The branch CPI penalty is calculated as extra CPI = (branch freq.)*(misprediction freq.)*(mispredict penalty) (a) Which one of the three terms in the penalty equation will techniques like loop unrolling and predication reduce? (3 pts.) (b) Which of the three terms does dynamic branch prediction attempt to reduce? (3 pts.) Associate each term or statement below with aspects of branching. Circle A or P, for Address or Prediction, respectively. Note that some questions may require both to be circled. (2 pts. each) 16. A / P BTAC 17. A / P BHT 18. A / P gshare 19. Consider a one-bit history for branch prediction. It records the state of the last branch as taken (T) or untaken (U) and predicts the next branch will be the same. Assume the bit is initialized to U. Determine the prediction accuracy on the following branch trace; include all trace entries in your calculation. (2 pts. each) (a) T T T T U T T T T U T T T T U T T T T U (b) T U T U T U T U T U T U T U T U T U T U 20. An HP processor used a three-bit branch history shift register in each branch history table entry and a “majority vote" of the BHSR bits to predict whether the next branch is taken (T) or untaken (U). (E.g., TUT => predict taken.) Assume the BHSR is initialized to UUU. Determine the prediction accuracy on the following branch trace; include all trace entries in your calculation. (3 pts. each) (a) T T T T U T T T T U T T T T U T T T T U (b) T U T U T U T U T U T U T U T U T U T U 21. Explain why the large majority of processors for servers, desktops, and laptops are superscalar rather than VLIW. (4 pts.) 22. In the Intel P6 pipeline diagram above, how do the Reservation Stations (RS) allow the “core” to execute instructions out of program order? (4 pts.) 23. What does the P6 provide to allow instructions to “retire” in program order (so that the processor will provide precise exceptions and can easily perform branch misprediction recovery)? (4 pts.) XC. Consider the following datapath. (Assume all registers are edge-triggered and thus immune from races.) Control signal identifiers are given for the in and out control points of the registers. Additional control signals include memory signals Mem, R (read), W (write), and 3-bit ALU function field F. ALU functions (three-bit F field) --------------------------------000: C = A + B 100: C = A - B 001: C = A 101: C = not A 010: C = A + 1 110: C = A - 1 011: C = A << 1 111: C = A >> 1 Complete the step-by-step RTL and the control signal sequence to fetch and execute an increment memory instruction “incr A”. Assume that the instruction is composed of two memory words: a one-word opcode followed by a one-word address. Assume also that the address of the instruction is in the PC, and that the memory is word-addressable. The actions of the instruction are memory[A] <- memory[A] + 1, for the memory address A given in the second word of the instruction. (up to 10 pts.) // fetch opcode and place in IR MAR <- PC PC <- PC + 1 MBR <- memory[MAR] IR <- MBR // fetch operand address and place in MAR MAR <- PC PC <- PC + 1 MBR <- memory[MAR] MAR <- MBR // control signals 5 (A=PC), F=001 (C=A), 5 (A=PC), F=010 (C=A+1), Mem, R 1 (A=MBR), F=001 (C=A), 5 (A=PC), 5 (A=PC), Mem, R 1 (A=MBR), 10 (MAR=C) 11 (PC=C) 13 (IR=C) F=001 (C=A), F=010 (C=A+1), 10 (MAR=C) 11 (PC=C) F=001 (C=A), 10 (MAR=C)