Announcement No class next week Classes on Tuesday 4/12 and Thursday 4/14 are canceled Final project presentation on Tuesday 4/19 and Thursday 4/21 Tuesday: Team 5-7, Thursday: Team 1-4 Send me your presentation slides before the Tuesday’s class Final project report is due before final exam Final review on April 28 Final exam is scheduled during the exam week ECE 351 Digital Systems Design 1 Recap from the last class Origin of timing issues Unpredictable delay in combinatorial logic Setup/hold time in D-FlipFlop Timing issues Combinatorial Delay in Synchronous Design • Static timing analysis Asynchronous Inputs • Timing hazards and solutions Metastability • Quantification Timing simulation ECE 351 Digital Systems Design 2 ECE 351 Digital Systems Design Von Neumann Computer Architecture Wei Gao Spring 2016 3 von Neumann Stored Program Computer "Stored Program" means the HW is designed to execute a set of pre-defined instructions The program and data reside in a storage unit (i.e., memory) To change the computer functionality, the program is changed instead of the HW This implies sequential execution The drawback is the "von Neumann bottleneck" in getting data into and out of memory ECE 351 Digital Systems Design 4 Block Diagram of von Neumann Computer CPU + Memory Clock synchronized via system bus ECE 351 Digital Systems Design 5 Block Diagram of the Central Processing Unit (CPU) ECE 351 Digital Systems Design 6 Central Processing Unit (CPU) CPU components Control unit • The state machine that directs the execution of instructions • For a given operation, the state machine traverses a specific path within its state diagram Processing unit • Contains all of the registers and ALU that hold and manipulate data • Memory signals (data/address) coming into/out-of this unit Control signals • Signals sent to processing unit from the control unit • Load data into registers • Select ALU operation Test signals ECE 351 Digital Systems Design 7 Processing Unit Registers within the processing unit Instruction Registers (IR) • Holds the Opcode that is read from memory • Passes the Opcode to the Control Unit as a test signal Memory Address Register (MAR) • Holds the current address being sent to memory Program Counter (PC) • Tracks the address of which instruction is being executed • MAR tracks PC when executing instruction ALU Operand Register (Z) • Holds one of the inputs to the ALU • The other input comes from one of the user-controlled registers ECE 351 Digital Systems Design 8 Arithmetic / Logic Unit (ALU) Performs data math and manipulation ALU operations We first load Z with the first input We then select which user-controlled register is the other input The control unit sends select lines to indicate which operation to perform Condition Code Register (CCR) Tracks the status of ALU operations (i.e., NZVC) These signals are sent to the control unit in order to alter sequence flow ECE 351 Digital Systems Design 9 Buses We route data in the processing unit between registers/memory using shared lines called buses Two buses are needed Bus 1: • Can take either PC or the User-Controlled Registers • Will drive to Memory_In or Bus 2 Bus 2: • Can take either ALU, Bus1, or Memory_Out • Will drive to IR, MAR, PC, User-Controlled Registers, or ALU Operand Reg Bus select lines come from the Control Unit to select which information is on which bus at any given time. ECE 351 Digital Systems Design 10 Register Loads Each register in the processing unit can be loaded by the control unit The input to most registers is Bus2 The CCR input is the ALU The loads are synchronous to clock and occur on the following state We can make a register as follows: MAR_Register : process (Clock, Reset) begin if (Reset = '0') then MAR <= "0000"; elsif (Clock'event and Clock='1') then if (MAR_Load = '1') then MAR <= Bus2; end if; end if; end process; ECE 351 Digital Systems Design 11 Control Unit Sub-Operations 1. Fetch Get next instruction into IR PC: program counter, always points to next instruction IR: holds the fetched instruction Processor Control unit Datapath ALU Controller Control /Status Registers 4 Y PC 100 IR load X, MAR X I/O 100 load X, 101 123 102 ADD X, Y ECE 351 Digital Systems Design 12 Control Unit Sub-Operations 2. Decode Determine what the instruction means Processor Control unit Datapath ALU Controller Control /Status Registers 4 Y PC 100 IR load X, MAR X I/O 100 load X, 101 123 102 ADD X, Y ECE 351 Digital Systems Design 13 Control Unit Sub-Operations 3. Fetch operands Move data from memory to data-path register Processor Control unit Datapath ALU Controller Control /Status Registers 4 Y 101 PC 100 IR load X, MAR 123 X I/O 100 load X, 101 123 102 ADD X, Y ECE 351 Digital Systems Design 14 Control Unit Sub-Operations 4. Execute Move data through the ALU Processor Control unit Datapath ALU Controller Control /Status Registers 4 Y 101 PC 100 IR load X, MAR 123 X I/O 100 load X, 101 123 102 ADD X, Y ECE 351 Digital Systems Design 15 Control Unit Sub-Operations 5. Store results Write data from register to memory Processor Control unit Datapath ALU Controller Control /Status Registers 4 Y 101 PC 100 IR load X, MAR 123 X I/O 100 load X, 101 123 102 ADD X, Y ECE 351 Digital Systems Design 16 Instruction Execution State 0 Put the current Program Counter value on the Memory Address Bus to read the first Opcode State 1 RTL: MAR <= PC Control: Bus1_Sel = PC Bus2_Sel = Bus1 MAR_Load Bring in the contents of memory (the Opcode) and put into IR Increment PC to point at either the Operand or next Opcode in memory RTL: IR <= Memory_Out PC = PC + 1 Control: Bus2_Sel = Memory_Out IR_Load PC_Inc ECE 351 Digital Systems Design 17 Instruction Execution State 2 The Control Unit now decodes IR This dictates the next state and which control signals are asserted (IR = ADD_XY) RTL: Z <= X State 3 Control: Bus1_Sel = X Bus2_Sel = Bus1 Z_Load RTL: ALU = ADD Control: Bus1_Sel = Y Bus2_Sel = ALU ALU_Sel = ADD X_Load CCR_Load ECE 351 Digital Systems Design 18 Von Neumann Bottleneck We have seen that the von Neumann computer is serial in its execution of instructions This is good for simplicity, but can limit performance There are many techniques to improve the performance of this computer Functional Timing Memory Architecture Algorithmic Branch Prediction Pipelines ECE 351 Digital Systems Design 19 1. Functional Timing A delay (or phase) can be added to the clock that the B-register sees. This creates a single-shot structure which executes in 1 cycle This allows multiple register transfers in one clock cycle A D CLK CLKA CLKB (from controller) AQ BQ B Q D Q tphase tphase LOAD tCQ A(0) A(1) A(0) tCQ ECE 351 Digital Systems Design 20 2. Memory Architecture The main bottleneck is trying to get Opcodes, Operands, and Data out of memory Memory systems run slower than CPU’s so access needs to be slowed to accommodate the memory technology (i.e., DRAM = Capacitors) Cache memory is a technique to improve the speed of memory access Cache is smaller, faster, SRAM that is placed on the same chip as the CPU. Cache improves performance Latency = the timing overhead associated with accessing memory ECE 351 Digital Systems Design 21 3. Algorithmic Branch Prediction Algorithms can be developed to “predict” a potential branch This would allow the memory controller to load up a portion of Cache with the code that could be potentially executed if the branch was taken ECE 351 Digital Systems Design 22 4. Pipelining Wash 1 2 3 4 5 6 7 8 1 2 Non-pipelined Dry 2 3 4 5 6 7 non-pipelined dish cleaning Decode 1 4 5 6 7 8 4 5 6 7 Pipelined 1 Fetch-instr. 3 8 1 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 Execute Instruction 1 Store res. pipelined instruction execution ECE 351 Digital Systems Design 3 pipelined dish cleaning Time 2 Fetch ops. 2 8 Time Pipelined 8 Time 23 Summary Von Neumann architecture CPU + memory Central Processing Unit (CPU) Control unit Processing unit Control/testing signals Control unit operations Instruction execution Von Neumann improvements Functional timing Memory architecture – using cache Algorithmic branch prediction Pipelining ECE 351 Digital Systems Design 24