Course on: “Advanced Computer Architectures” Pipelining: Basic Concepts Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it Outline Reduced Instruction Set of MIPS Processor Implementation of MIPS Processor Performance Optimization through Pipelining MIPS Processor Pipeline The Problem of Pipeline Hazards The Solution of Data Hazards MIPS Otpimized Pipeline Performance Evaluation in Pipelining Cristina Silvano – Politecnico di Milano -2- Spring 2021 Main Characteristics of MIPS Architecture RISC (Reduced Instruction Set Computer) Architecture Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the performance of CISC CPUs. LOAD/STORE Architecture ALU operands come from the CPU general purpose registers and they cannot directly come from the memory. Dedicated instructions are necessary to: • • load data from memory to registers store data from registers to memory Pipeline Architecture: Performance optimization technique based on the overlapping of the execution of multiple instructions derived from a sequential execution flow. Cristina Silvano – Politecnico di Milano -3- Spring 2021 Reduced Instruction Set of MIPS Processor ALU instructions: add $s1, $s2, $s3 addi $s1, $s1, 4 Load/store instructions: lw $s1, offset ($s2) sw $s1, offset ($s2) # $s1 $s2 + $s3 # $s1 $s1 + 4 # $s1 M[$s2+offset] # M[$s2+offset] $s1 Branch instructions to control the control flow of the program: • • Conditional branches: the branch is taken only if the condition is satisfied. Examples: beq (branch on equal) and bne (branch on not equal) beq $s1, $s2, L1 # go to L1 if ($s1 == $s2) bne $s1, $s2, L1 # go to L1 if ($s1 != $s2) Unconditional jumps: the branch is always taken. Examples: j (jump) and jr (jump register) j L1 # go to L1 jr $s1 # go to add. contained in $s1 Cristina Silvano – Politecnico di Milano -4- Spring 2021 Formats of MIPS 32-bit Instructions Type R (Register) • ALU Instructions Type I (Immediate) • Immediate Instructions • Load/store instructions • Conditional branch instructions Tipo J (jump) • Unconditional jumps instructions R I J 31 6-bit op op 26 25 5-bit rs rs op Cristina Silvano – Politecnico di Milano 21 20 5-bit 16 15 rt rt 5-bit rd 11 10 5-bit 6 5 6-bit 0 shamt funct offset/immediate address -5- Spring 2021 Phases of execution of MIPS Instructions Every instruction in the MIPS subset can be implemented in at most 5 clock cycles (phases) as follows: 1) Instruction Fetch (IF): • Send the content of Program Counter register to Instruction Memory and fetch the current instruction from Instruction Memory. Update the PC to the next sequential address by adding 4 to the PC (since each instruction is 4 bytes). 2) Instruction Decode and Register Read (ID): • • Decode the current instruction (fixed-field decoding) and read from the Register File of one or two registers corresponding to the registers specified in the instruction fields. Sign-extension of the offset field of the instruction in case it is needed. Cristina Silvano – Politecnico di Milano -6- Spring 2021 Phases of execution of MIPS Instructions 3) Execution (EX): The ALU operates on the operands prepared in the previous cycle depending on the instruction type: • Register-Register ALU Instructions: • • Register-Immediate ALU Instructions: • • ALU executes the specified operation on the first operand read from the RF and the sign-extended immediate operand Memory Reference: • • ALU executes the specified operation on the operands read from the RF ALU adds the base register and the offset to calculate the effective address. Conditional branches: • Compare the two registers read from RF and compute the possible branch target address by adding the sign-extended offset to the incremented PC. Cristina Silvano – Politecnico di Milano -7- Spring 2021 Phases of execution of MIPS Instructions Memory Access (ME) • • • Load instructions require a read access to the Data Memory using the effective address Store instructions require a write access to the Data Memory using the effective address to write the data from the source register read from the RF Conditional branches can update the content of the PC with the branch target address, if the conditional test yielded true. Write-Back Cycle (WB) • • Load instructions write the data read form memory in the destination register of the RF ALU instructions write the ALU results into the destination register of the RF. Cristina Silvano – Politecnico di Milano -8- Spring 2021 Phases of execution of MIPS Instructions # $x $y + $z ALU Instructions: op $x,$y,$z Instr. Fetch &. PC Increm. Read of Source Regs. $y and $z ALU OP ($y op $z) Load Instructions: lw $x,offset($y) Instr. Fetch & PC Increm. Read of Base Reg. $y Read of Base Reg. $y & Source $x # $x M[$y + offset] ALU Op. Read Mem. ($y+offset) M($y+offset) Store Instructions: sw $x,offset($y) Instr. Fetch & PC Increm. Write Back of Destinat. Reg. $x Write Back of Destinat. Reg. $x # M[$y + offset] $x ALU Op. Write Mem. ($y+offset) M($y+offset) Conditional Branch: beq $x,$y,offset Instr. Fetch & PC Increm. Read of Source Regs. $x and $y Cristina Silvano – Politecnico di Milano ALU Op. ($x-$y) & (PC+4+offset) -9- Write PC Spring 2021 Implementation of MIPS Processor Cristina Silvano – Politecnico di Milano - 10 - Spring 2021 Basic Implementation of MIPS data path Data PC Address Instruction Instruction memory Register # Registers Register # ALU Address Data memory Register # Data Instruction Memory (read-only memory) separated from Data Memory 32 General-Purpose Registers organized in a Register File (RF) with 2 read ports and 1 write port. Cristina Silvano – Politecnico di Milano - 11 - Spring 2021 Implementation of ALU and Load/Store Instructions WR [25-21] Ins truction [20-16] M U [15-11] X [15-0] R egis ter R ead 1 R egis ter R ead 2 Content R eg. 1 R eg is ter F ile Write R egis ter Write Data WR OP Content R eg. 2 AL U Zero M U X R es ult R ead Addres s Write Addres s Write Data RD R ead Data M U X Data Memory S ign 16 bit E xtens ion 32 bit Cristina Silvano – Politecnico di Milano - 12 - Spring 2021 Implementation of Conditional Branch Instructions 2-bit Left S hifter WR [25-21] R egis ter R ead 1 Ins truction [20-16] R egis ter R ead 2 Content R eg. 1 R eg is ter F ile Write R egis ter Write Data [15-0] Content R eg. 2 Adder Zero Branch Target Addres s PC +4 (form fetch) AL U S ign 16 bit E xtens ion 32 bit Cristina Silvano – Politecnico di Milano - 13 - Spring 2021 Implementation of MIPS data path +4 Adder Adder 2-bit Left S hifter WR PC R ead Addres s Ins truction Ins truction Memory I i i [25-21] [20-16] M U [15-11] X [15-0] Cristina Silvano – Politecnico di Milano R egis ter R ead 1 R egis ter R ead 2 Content R eg. 2 WR OP Content R eg. 1 R eg is ter F ile Write R egis ter Write Data M U X AL U Zero M U X R es ult R ead Addres s Write Addres s Write Data RD R ead Data Data Memory S ign 16 bit E xtens ion 32 bit - 14 - Spring 2021 M U X Implementation of MIPS data path with Control Unit +4 Adder Adder M U X 2-bit Left S hifter PC R ead Addres s Ins truction Ins truction Memory [25-21] [20-16] M U [15-11] X R egis ter R ead 1 R agis ter R ead 2 Content R eg. 1 Content R eg. 2 16 bit M U X B OP Branch [31-26] Write Data R ead Data M U X Data Memory S ign E xtens ion 32 bit Des tination R egis ter MemWR MemR D MemToR eg Control Unit ALU_op [5-0] Cristina Silvano – Politecnico di Milano R es ult R ead Addres s Write Addres s ALU_opB R eg WR [15-0] Zero AL U R eg is ter F ile Write R egis ter Write Data A - 15 - ALU Control Unit Spring 2021 MIPS PIPELINING Cristina Silvano – Politecnico di Milano - 16 - Spring 2021 Pipelining Performance optimization technique based on the overlap of the execution of multiple instructions deriving from a sequential execution flow. Pipelining exploits the parallelism among instructions in a sequential instruction stream. Basic idea: The execution of an instruction is divided into different phases (pipelines stages), requiring a fraction of the time necessary to complete the instruction. The stages are connected one to the next to form the pipeline: instructions enter in the pipeline at one end, progress through the stages, and exit from the other end, as in an assembly line. Cristina Silvano – Politecnico di Milano - 17 - Spring 2021 Pipelining Advantage: technique transparent for the programmer. Technique similar to a assembly line: a new car exits from the assembly line in the time necessary to complete one of the phases. An assembly line does not reduce the time necessary to complete a car, but increases the number of cars produced simultaneously and the frequency to complete cars. Cristina Silvano – Politecnico di Milano - 18 - Spring 2021 Sequential vs. Pipelining Execution I1 IF ID EX I2 MEM WB IF ID EX 10 ns MEM Time IF ID EX MEM WB I2 2 ns IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB 2 ns IF ID EX MEM 2 ns I4 I5 Cristina Silvano – Politecnico di Milano … 10 ns I1 I3 WB 2 ns - 19 - WB Spring 2021 Pipelining The time to advance the instruction of one stage in the pipeline corresponds to a clock cycle. The pipeline stages must be synchronized: the duration of a clock cycle is defined by the time requested by the slower stage of the pipeline (i.e. 2 ns). The goal is to balance the length of each pipeline stage If the stages are perfectly balanced, the ideal speedup due to pipelining is equal to the number of pipeline stages. Cristina Silvano – Politecnico di Milano - 20 - Spring 2021 Performance Improvement Ideal case (asymptotically): If we consider the multicycle unpipelined CPU3 composed of 5 cycles of 2 ns and the pipelined CPU2 with 5 stages of 2 ns : • The latency (total execution time) of each instruction is not varied (10 ns) • The throughput (number of instructions completed in the time unit) is improved of 5 times: (1 instruction completed every 10 ns) vs. (1 instruction completed every 2 ns) Cristina Silvano – Politecnico di Milano - 21 - Spring 2021 Pipeline Execution of MIPS Instructions IF Instruction Fetch ID Instruction Decode Cristina Silvano – Politecnico di Milano EX Execution - 22 - ME Memory Access WB Write Back Spring 2021 Pipeline Execution of MIPS Instructions IF Instruction Fetch ID Instruction Decode ALU Instructions: op $x,$y,$z Instr. Fetch & PC Increm. Read of Source Regs. $y and $z EX Execution ME Memory Access WB Write Back # $x $y + $z Write Back Destinat. Reg. $x ALU Op. ($y op $z) Load Instructions: lw $x,offset($y) # $x M[$y + offset] Instr. Fetch & PC Increm. Read of Base Reg. $y ALU Op. Read Mem. ($y+offset) M($y+offset) Write Back Destinat. Reg. $x Store Instructions: sw $x,offset($y) # M[$y + offset] $x Instr. Fetch & PC Increm. Read of Base Reg. ALU Op. Write Mem. $y & Source $x ($y+offset) M($y+offset) Conditional Branches: beq $x,$y,offset Instr. Fetch & PC Increm. Read of Source Regs. $x and $y Cristina Silvano – Politecnico di Milano ALU Op. ($x-$y) & (PC+4+offset) - 23 - Write PC Spring 2021 Implementation of MIPS pipeline The division of the execution of each instruction in 5 stages implies that in each clock cycle 5 instructions are in execution the implementation of pipelined CPU with 5 stages must be composed of 5 modules corresponding to 5 execution stages we need pipeline registers to separate the different stages Cristina Silvano – Politecnico di Milano - 24 - Spring 2021 Implementation of MIPS pipeline ID — Instruction Decode M U X IF /ID +4 EX — Execution ID/E X MEM — Memory Access WB — Write Back ME M/WB E X/ME M Adder Adder 2-bit Left S hifter WR PC R ead Addres s Ins truction [25-21] [20-16] R egis ter R ead 1 R egis ter R ead 2 M [15-11] U X R egis ter Write Write Data [15-0] IF — Instruction Fetch Content regis ter 1 AL U RF Ins truction Memory 16 bit Cristina Silvano – Politecnico di Milano M U X Content regis ter 2 S ign extens ion WR OP Zero R es ult R ead Addres s Write Addres s Write Data RD R ead Data Data Memory 32 bit - 25 - Spring 2021 M U X Resources used during the pipeline execution I1 I2 I3 IM 2 ns REG IM 2 ns I4 A L U DM REG REG A L U DM REG IM REG A L U DM REG REG A L U DM REG REG A L U DM 2 ns I5 IM 2 ns IM Time REG IM = Instruction Memory REG = Register File DM = Data Memory Cristina Silvano – Politecnico di Milano - 26 - Spring 2021 The Problem of Pipeline Hazards Cristina Silvano – Politecnico di Milano - 27 - Spring 2021 The Problem of Pipeline Hazards A hazard (conflict) is created whenever there is a dependence between instructions, and instructions are close enough that the overlap caused by pipelining would change the order of access to the operands involved in the dependence. Hazards prevent the next instruction in the pipeline from executing during its designated clock cycle. Hazards reduce the performance from the ideal speedup gained by pipelining. Cristina Silvano – Politecnico di Milano - 28 - Spring 2021 Three Classes of Hazards 1) Structural Hazards: Attempt to use the same resource from different instructions simultaneously • Example: Single memory for instructions and data 2) Data Hazards: Attempt to use a result before it is ready • Example: Instruction depending on a result of a previous instruction still in the pipeline 3) Control Hazards: Attempt to make a decision on the next instruction to execute before the condition is evaluated • Example: Conditional branch execution • Control hazards will be studied in the next lesson Cristina Silvano – Politecnico di Milano - 29 - Spring 2021 Structural Hazards No structural hazards in MIPS architecture: • Instruction Memory separated from Data Memory • Register File used in the same clock cycle: Read access by an instruction and write access by another instruction I1 I2 I3 IM 2 ns REG IM 2 ns I4 I5 Cristina Silvano – Politecnico di Milano A L U DM REG REG A L U DM REG IM REG A L U DM REG REG A L U DM REG REG A L U DM 2 ns IM 2 ns IM - 30 - Time REG Spring 2021 Data Hazards If the instruction executed in the pipeline are dependent, data hazards can arise when instructions are too close Example: sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Cristina Silvano – Politecnico di Milano # # # # # Reg. $2 written by sub 1° operand ($2) depends on sub 2° operand ($2) depend on sub 1° ($2) & 2° ($2) depend on sub Base reg. ($2) depends on sub - 31 - Spring 2021 Data Hazards: Example sub $2, $1, $3 IF and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB $15,100($2) Cristina Silvano – Politecnico di Milano - 32 - Spring 2021 The Solution of Data Hazards Cristina Silvano – Politecnico di Milano - 33 - Spring 2021 Data Hazards: Possible Solutions Compilation Techniques: a) Insertion of nop (no operation) instructions b) Instructions scheduling to avoid that correlating instructions are too close • • The compiler tries to insert independent instructions among correlating instructions When the compiler does not find independent instructions, it insert nops. Hardware Techniques: c) Insertion of stalls or “bubbles” in the pipeline d) Data forwarding or bypassing Cristina Silvano – Politecnico di Milano - 34 - Spring 2021 a) Insertion of nops: Example sub $2, $1, $3 nop IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME nop nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) Cristina Silvano – Politecnico di Milano - 35 - Spring 2021 WB b) Scheduling: Example Example: sub $2, $1, $3 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 add $4, $10, $11 and $7, $8, $9 lw $16, 100($18) sw $15,100($2) and add $4, $10, $11 or $13, $6, $2 and $7, $8, $9 add lw $16, 100($18) sw $15,100($2) Cristina Silvano – Politecnico di Milano - 36 - $12, $2, $5 $14, $2, $2 Spring 2021 c) Insertion of Stalls: Example sub $2, $1, $3 and $12, $2, $5 or IF ID EX ME WB previous instructions should continue… IF stall stall stall ID EX ME WB stall stall stall IF ID EX ME WB IF ID EX ME WB IF ID EX ME $13, $6, $2 add $14, $2, $2 sw WB $15,100($2) Cristina Silvano – Politecnico di Milano - 37 - Spring 2021 d) Forwarding Data forwarding uses temporary results stored in the pipeline registers instead of waiting for the write back of results in the RF. We need to add multiplexers at the inputs of ALU to fetch inputs from pipeline registers to avoid the insertion of stalls in the pipeline. Cristina Silvano – Politecnico di Milano - 38 - Spring 2021 Forwarding: Example EX/EX path ID EX ME WB IF ID EX ME WB $13, $6, $2 IF ID EX ME WB add $14, $2, $2 MEM/ID path IF ID EX ME WB IF ID EX ME sub $2, $1, $3 IF MEM/EX path and $12, $2, $5 or sw WB $15,100($2) Cristina Silvano – Politecnico di Milano - 39 - Spring 2021 Forwarding Paths EX/EX path I1 I2 IM REG RD IM I3 I4 A L U DM REG RD A L U IM REG RD IM I5 MEM/EX path REG WR DM MEM/ID path REG WR A L U DM REG RD A L U IM REG RD REG WR DM A L U REG WR DM REG WR Three data forwarding paths: EX/EX path MEM/EX path MEM/ID path Cristina Silvano – Politecnico di Milano - 40 - Spring 2021 Implementation of MIPS with Forwarding Unit PC Memoria Istruzioni ID/EX Instruction IF/ID Reg. WB path M u x M u x MEM/WB Memoria Dati ALU M u x M u x M u x IF/ID.RegisterRs Rs IF/ID.RegisterRt Rt IF/ID.RegisterRt Rt IF/ID.RegisterRd Rd M u x EX/MEM.RegisterRd Forwarding unit MEM/ID path MEM/EX path Cristina Silvano – Politecnico di Milano EX/MEM - 41 - MEM/WB.RegisterRd EX/EX path Spring 2021 Data Hazards: Load/Use Hazard L1: lw $s0, 4($t1) # $s0 <- M [4 + $t1] L2: add $s5, $s0, $s1 # 1° operand depends from L1 lw add $s0, 4($t1) CK1 CK2 CK3 CK4 CK5 IF ID EX MEM WB IF ID EX MEM $s5,$s0,$s1 Cristina Silvano – Politecnico di Milano - 42 - CK6 CK7 WB Spring 2021 Data Hazards: Load/Use Hazard lw add With forwarding using the MEM/EX path: 1 stall needed $s0, 4($t1) CK1 CK2 CK3 CK4 CK5 IF ID EX MEM WB ID EX $s5,$s0,$s1 Cristina Silvano – Politecnico di Milano IF - 43 - CK6 CK7 MEM WB Spring 2021 Data Hazards: Load/Store Hazard L1: lw $s0, VECTA($t1) L2: sw $s0, VECTB($t1) lw $s0, VECTA($t1) # $s0 <- M [VECTA + $t1] # M [VECTB + $t1] <- $s0 CK1 CK2 CK3 CK4 CK5 IF ID EX MEM WB IF ID EX MEM sw $s0, VECTB ($t1) Cristina Silvano – Politecnico di Milano - 44 - CK6 CK7 WB Spring 2021 Data Hazards: Load/Store Hazard With forwarding by introducing the MEM/MEM path: solved lw $s0, VECTA($t1) CK1 CK2 CK3 CK4 CK5 IF ID EX MEM WB IF ID EX MEM sw $s0, VECTB ($t1) Cristina Silvano – Politecnico di Milano - 45 - CK6 CK7 WB Spring 2021 Forwarding Paths EX/EX path I1 I2 IM REG RD IM I4 DM REG RD A L U IM I3 I5 A L U REG RD IM Ordine di esecuzione delle istruzioni MEM/EX path REG WR Tempo MEM/ID path DM REG WR A L U DM REG RD A L U IM Four data forwarding paths: EX/EX path MEM/EX path MEM/ID path MEM/MEM path (for LOAD/STOREs) REG RD MEM/MEM path REG WR DM A L U REG WR DM REG WR Spring 2021 MIPS Optimized Pipeline Cristina Silvano – Politecnico di Milano - 47 - Spring 2021 MIPS Optimized Pipeline Register File used in 2 stages: Read access during ID and write access during WB What happens if read and write refer to the same register in the same clock cycle? • It is necessary to insert one stall Optimized Pipeline: we assume the RF read occurs in the second half of clock cycle and the RF write in the first half of clock cycle What happens if read and write refer to the same register in the same clock cycle? • It is not necessary to insert one stall Cristina Silvano – Politecnico di Milano - 48 - Spring 2021 Resources Used in the Optimized Pipeline I1 I2 I3 IM 2 ns REG IM 2 ns DM REG REG A L U DM REG IM REG A L U DM REG REG A L U DM REG REG A L U DM 2 ns I4 I5 A L U Ordine di esecuzione delle istruzioni IM 2 ns IM Tempo REG IM = Instruction Memory REG = Register File DM = Data Memory Cristina Silvano – Politecnico di Milano - 49 - Spring 2021 Data Hazards in the Optimized Pipeline: Example sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 IM 2 ns REG IM 2 ns add $14, $2, $2 sw A L U DM REG REG A L U DM REG IM REG A L U DM REG REG A L U DM REG REG A L U DM 2 ns $15,100($2) Ordine di esecuzione delle istruzioni IM 2 ns IM Tempo REG It is necessary to insert two stalls Cristina Silvano – Politecnico di Milano - 50 - Spring 2021 Forwarding Paths in the Optimized Pipeline EX/EX path I1 I2 I3 I4 I5 IM REG RD IM A L U DM REG RD A L U IM REG RD IM MEM/EX path REG WR DM REG WR A L U DM REG RD A L U IM REG RD MEM/MEM path REG WR DM A L U REG WR DM REG WR Only three data forwarding paths: EX/EX path MEM/EX path MEM/MEM path (for LOAD/STOREs) - 51 - Spring 2021 Data hazards: RAW, WAW, WAR Cristina Silvano – Politecnico di Milano - 52 - Spring 2021 Data Hazards Data hazards analyzed up to now are: 1) RAW (READ AFTER WRITE) hazard: instruction n+1 tries to read a source register before the previous instruction n has written it in the RF. • Example: add $r1, $r2, $r3 sub $r4, $r1, $r5 • By using forwarding, it is always possible to solve this conflict without introducing stalls, except for the load/use hazards where it is necessary to add one stall Cristina Silvano – Politecnico di Milano - 53 - Spring 2021 Data Hazards Other types of data hazards in the pipeline: 2) WAW (WRITE AFTER WRITE) hazard 3) WAR (WRITE AFTER READ) hazard WAW and WAR hazards occur more easily when instructions are executed out-of-order such as in multi-cycle operations to execute floating point arithmetic or to access the data memory (load/store) Cristina Silvano – Politecnico di Milano - 54 - Spring 2021 Data Hazards: WAW (WRITE AFTER WRITE) WAW (WRITE AFTER WRITE) hazard: Instruction n+1 tries to write a destination operand before it has been written by the previous instruction n write operations executed in the wrong order (out-of-order) • WAW hazards could not occur in the MIPS pipeline because all the register write operations occur in the WB stage. • WAW hazards could occur in the MIPS pipeline when extending to handle multi-cycle operations to execute or to access the data memory because in this case instructions can complete in a different order than they were issued. Cristina Silvano – Politecnico di Milano - 55 - Spring 2021 Data Hazards: WAW (WRITE AFTER WRITE) Example: If we assume the register write in the ALU instructions occurs in the fourth stage and that load instructions require two stages (MEM1 and MEM2) to access the data memory, we can have: lw add CK1 CK2 CK3 CK4 IF ID EX MEM1 IF ID EX $r1, 0($r2) $r1,$r2,$r3 Cristina Silvano – Politecnico di Milano - 56 - CK5 MEM2 CK6 CK7 WB WB Spring 2021 Data Hazards: WAW (WRITE AFTER WRITE) Example: If we assume the floating point ALU operations require a multi-cycle execution, we can have: mul $f6,$f2,$f2 add $f6,$f2,$f2 CK1 CK2 CK3 CK4 CK5 CK6 CK7 CK8 IF ID MUL1 MUL2 MUL3 MUL4 MEM WB IF ID AD1 AD2 MEM WB Cristina Silvano – Politecnico di Milano - 57 - Spring 2021 Data Hazards: WAR (WRITE AFTER READ) WAR (WRITE AFTER READ) hazard: Instruction n+1 tries to write a destination operand before it has been read from the previous instruction n instruction n reads the wrong value. For example: sw $y, 0($x) addi $x, $x, 4 • • # sw has to read $x # addi writes Sx WAR hazards could not occur in the MIPS pipeline because Read Operands always occur in the ID stage and write results in the WB stage. As before, if we assume the register write in the ALU instructions occurs in the fourth stage and that we need two stages to access the data memory, some instructions could read operands too late in the pipeline. Cristina Silvano – Politecnico di Milano - 58 - Spring 2021 Performance evaluation in pipelining Cristina Silvano – Politecnico di Milano - 59 - Spring 2021 Performance Evaluation in Pipelining Pipelining increases the CPU instruction throughput (number of instructions completed per unit of time), but it does not reduce the execution time (latency) of a single instruction. Pipelining usually slightly increases the latency of each instruction due to the imbalance among the pipeline stages and overhead in the control of the pipeline. • • • Imbalance among pipeline stages reduces performance since the clock can run no faster than the time needed for the slowest pipe stage. Pipeline overhead arises from pipeline register delay and clock skew. All instructions should be the same number of pipeline stages Cristina Silvano – Politecnico di Milano - 60 - Spring 2021 Performance Metrics IC = Instruction Count # Clock Cycles = IC + # Stall Cycles + 4 CPI = Clock Per Instruction = # Clock Cycles / IC = (IC + # Stall Cycles + 4) / IC MIPS = fclock / (CPI * 10 6) Prof. Cristina Silvano –Politecnico di Milano - 61 - Spring 2021 Example IC = Instruction Count = 5 # Clock Cycles = IC + # Stall Cycles + 4 = 5 + 3 + 4 = 12 CPI = Clock Per Instruction = # Clock Cycles / IC = 12 / 5 = 2.4 MIPS = fclock / (CPI * 10 6) = 500 MHz / 2.4 * 10 6 = 208.3 sub $2, $1, $3 and $12, $2, $5 or C1 C2 IF ID IF C3 C4 C5 C6 C7 C8 C9 EX ME WB stall stall stall stall stall ID EX ME WB stall IF ID EX ME WB IF ID EX ME WB IF ID EX ME $13, $6, $2 add $14, $2, $2 sw C10 C11 C12 WB $15,100($2) Prof. Cristina Silvano –Politecnico di Milano - 62 - Spring 2021 Performance Metrics (2) Let us consider n iterations of a loop composed of m instructions per iteration requiring k stalls per iteration IC per_iter = m # Clock Cycles CPI per_iter per iter = (IC = IC per iter per_iter + # Stall Cycles + # Stall Cycles per_iter per_iter +4) /IC +4 per_iter = (m + k + 4) / m MIPS per_iter = fclock / (CPI Prof. Cristina Silvano –Politecnico di Milano per_iter * 10 6) - 63 - Spring 2021 Asymptotic Performance Metrics Let us consider n iterations of a loop composed of m instructions per iteration requiring k stalls per iteration ICAS = Instruction Count AS = m * n # Clock Cycles = IC CPI AS AS + # Stall CyclesAS + 4 = lim n -> ( IC = lim n -> ( m *n + k * n + 4 ) / ( m * n ) AS + # Stall CyclesAS + 4) /IC AS = (m + k) / m MIPS AS = fclock / (CPIAS* 10 6) Prof. Cristina Silvano –Politecnico di Milano - 64 - Spring 2021 Performance Issues in Pipelining The ideal CPI on a pipelined processor would be 1, but stalls cause the pipeline performance to degrade form the ideal performance, so we have: Ave. CPI Pipe = Ideal CPI + Pipe Stall Cycles per Instruction = 1 + Pipe Stall Cycles per Instruction Pipeline Stall Cycles per Instruction are due to: Structural Hazards + Data Hazards + Control Hazards + Memory Stalls (we will see in the next lessons) Cristina Silvano – Politecnico di Milano - 65 - Spring 2021 Reference Appendix A of the textbook: J. Hennessey, D. Patterson, “Computer Architecture: A Quantitative Approach” 4th Edition, Morgan-Kaufmann Publishers. Cristina Silvano – Politecnico di Milano - 66 - Spring 2021