Course Website: http://www.eng.tau.ac.il/~marko Advanced Computer Structure Laboratory Chapter 1: Orientation Liron David Course’s Staff • Prof. Guy Even guy@eng.tau.ac.il • Lab Assistant: Marko Markov marko@eng.tau.ac.il • Recitations: Liron David lirondavid@gmail.com The RESA computer Work flow The RESA computer is based on XSA-3s1000 board that contains: 1. FPGA (Field Programmable Gate Array) with 1000K gates 2. 32 MB of SDRAM 3. a direct interface to a PC 1. Design Entry - editing the design using the software. Use a top-down approach. Add remarks to your designs. 2. Simulation - testing your design. Find mistakes in connections of wires, wrong polarities, incompatible names of signals, etc. The FPGA is a device that can be programmed while it is on the board. 3. Implementation - a configuration file (called a '.bit' file) is created from your design. This stage is fully automatic. The lab’s goal is to design and implement a simplified DLX CPU on the FPGA, while the SDRAM will serve as the main memory of this CPU. 4. Running your design on the RESA. Schedule Handout # Recitation Subject Week No. 1 Orientation 1 2 The RESA’s Bus 1 3 Bus Slave (with I/O control logic) 2 4 Monitor slave 3 4 Monitor slave Cont. 4 5 Read/Write machine 5 6 Load/Store machine 6 6 Test Load/Store machine 7 7 DLX – Design 8 7 DLX – Simulation + Implementation 9 7 DLX – Testing + Timing 10 7 DLX - Program 11 Report’s Submission ü The reports should be submitted via e-mail as one file in pdf format with the group number (without names and i.ds). ü Every answer/design should be followed by an explanation. ü Explain the main signals’ behavior on waveforms, data snapshot and test vectors . See an example on the next page. ü On the purple labeled weeks, a recitation will be held. ü Recommend reading material: the matching chapter of the lab notes. ü Every group contains 2-3 students. ü Pay attention to the remarks on your returned handout 2’s reports. Reports’ Submission For the next recitation Report Due to Pre-Lab Reports Submitting via mail before the relevant lab 40% Submitting via mail on the next lab 60% Post-Lab Reports Grading Please read Chapter 2 of the lab notes: ü Chapter 2: The RESA’s Busses ü Reports have to be submitted on time. For personal problems – contact me ahead. Unpermitted delays will reduce points. The RESA’s bus In order to enable communication between multiple devices , we connect all them to the same wires called a bus. Advanced Computer Structure Laboratory bus n Chapter 2: The RESA’s Busses Liron David The RESA’s bus The bus protocol of the RESA computer is a synchronous protocol. This means that there is a global clock signal, which is present in all the devices that are connected to the bus. Every transition of a signal in the bus is synchronized with an edge of the clock. bus CLK n The communication over the bus is determined by a protocol. The protocol coordinates the usage of the common resource, namely, the bus. The RESA’s bus The communication via the bus takes place in chunks called transactions - which are the basic "unit of communication". In each transaction, one word of data is transmitted. The party, which initiates the transaction is called the master and the party which is asked to respond is called the slave. There are two types of transactions: 1. a write transaction is a transaction in which the master wants to send a value to the slave. 2. a read transaction is a transaction in which the master wants to receive a value from the slave. The RESA Computer The RESA computer is based on XSA-3s1000 board that contains: 1. FPGA (Field Programmable Gate Array) with 1000K gates 2. 32 MB of SDRAM 3. a direct interface to a PC The RESA’s bus RESA has two buses with their own protocol: Serial Protocol Interface (PC) FPGA The FPGA is a device that can be programmed while it is on the board. One, the external serial bus uses parallel port connection between the PC and the XSA board. The protocol describes how data is transferred between the hardware interface (PC) and Serial Protocol Interface (XSA board). main memory (SDRAM) The RESA’s bus The RESA’s bus RESA has two buses with their own protocol: The second, internal parallel, is placed on the XSA board and connects: 2 masters: Serial Protocol Interface (PC), CPU (DLX or other master designed by student) and 2 slaves: main memory(SDRAM), monitor slave with logic analyzer(designed by student). main memory (SDRAM) Serial Protocol Interface (PC) General Structure of the RESA: FPGA PC DLX CPU M M FPGA n Parallel Bus S S Monitor slave Main Memory The RESA’s bus The parallel RESA’s bus The following signals are transmitted along the bus: IN_INIT: This signal indicates the status of the master. When low the master is active and bus is inaccessible. When high, master is idle and bus can be accessed by the PC. Address Strobe (AS_N): The beginning of a transaction is signaled by a master device by asserting the AS_N signal. This tells to all the slaves that, in the next clock cycle, information will be sent on the bus. The parallel RESA’s bus The parallel RESA’s bus Address (A[31:0]): The address signals are used to transmit the address of the requested data item. Note that the address of a data item holds the slave's device name as well as the item's name within the slave. For this purpose, some of the address bits are used to address the slave device, the rest are used to address the data item within the device. 31 A[31:0] : 0 The address of the slave device The address of the data item within the device Data (D[31:0]): The data signals are used to transmit the requested data item which is being transmitted in a transaction. In a write transaction, the master asserts the Data signals, and in a read transaction the slave asserts the Data signals. Write (WR_N): The WR_N signal is asserted by the master to indicate whether the transaction is a read transaction or a write transaction. Acknowledge (ACK_N): The slave acknowledges the transmission of the data item by asserting the ACK_N signal. In a write transaction, the slave signals that the data will be read and processed by it in the next clock cycle. In a read transaction, the slave signals that the requested data item will be transmitted in the next clock cycle. in the next clock cycle, information will be sent on the bus. A Read Transaction the master is active and bus is inaccessible in the next clock cycle, information will be sent on the bus. A Write Transaction It is a read transaction the address of the requested data item. the requested data item which is being transmitted in a transaction the requested data item will be transmitted in the next clock cycle. Handout #2: The RESA’s parallel bus It is a write transaction the requested address write to the requested data item to write Bus Bus Interface The written data will be read and processed by the slave in the next clock cycle Block Diagram - The RESA’s parallel bus Consider a CPU that wants to communicate over the RESA bus as a master device. The CPU is connected to the RESA bus via a simple bus interface. The simple bus interface is placed on the FPGA between the CPU and to the RESA bus. CPU the master is active and bus is inaccessible Slave These are given to you Block Diagram - The Bus Interface Block Diagram - The Bus Interface Communication between the CPU and the bus interface is implemented by 3 registers: CLK Required Control Signals: In_init, DONE, AS_N, WR_N, CE_AD, CE_DO, CE_DI, DE_A, DE_D Missing Blocks: Solving modules (using buffer) , Additional Logic, FF’s DE_A CE_AD ACK_N wr_req rd_req R_AD busy D[31:0] CE_DO AO[31:0] DO[31:0] AS_N WR_N R_DO CE_DI DI[31:0] In_init DONE Given Control Signals: wr_req, rd_req, busy, ACK_N A[31:0] DE_D R_DI Given Control Signals Required Control Signals Missing Blocks Waveforms – CPU ‘read’ instruction Waveforms – CPU ‘write’ instruction Waveforms – CPU ‘read after write’ instruction For the next recitation Please read Chapter 3 of the lab notes: ü Chapter 3: A Simple Slave Device The RESA Architecture Advanced Computer Structure Laboratory Chapter 3: A Simple Slave Device Liron David The FPGA is a device that can be programmed while it is on the board. The RESA Architecture The RESA Monitor Program A CPLD XC9572XL – serve as the interface between the PC parallel port and the FPGA. The RESA computer is based on XSA-3s1000 board that contains: 1. FPGA (Field Programmable Gate Array) with 1000K gates 2. 32 MB of SDRAM 3. a direct interface to a PC An 32MB Memory – Read-write memory, serve as the main memory of the application The RESA program is a suite of programs, which is used to diagnose and setup the RESA. The parts of the RESA program, which you are going to use are: - Other parts of XSA-3s1000 – such as flash memory, programmable oscillator, LED indicator, VGA and PS2 ports. A Spartan III XC3S1000 FPGA – A 1000K gates which can be programmed. ü Configure RESA FPGA. ü Use XLINIX to make a ‘.bit’ file. ü Use RESA to program the FPGA from the ‘.bit’ file. ‘.bit’ The RESA Monitor Program ü Access RESA memory. ü read blocks of memory after an application (e.g. a CPU) completes its execution. ü Run and Debug (a) Upload programs. (b) - ü write (upload) programs in the application's language in the memory (i.e. DLX's assembly program). ü Run and Debug The monitor enables one to initiate read and write transactions to single RAM and Slave addresses. This is a very powerful tool when we use a concept called "built-in monitoring". (we will see it later). Set single-step mode or continuous mode. - ‘.cod’ The RESA Monitor Program c) The RESA Monitor Program ‘.cod’ The RESA Monitor Program ü Write programs in assembly language - ü Compile your programs Software-hardware communication protocol - ü Use the simulator to test your program. ü Generate Graphs. ü Hardware interface communicate with FPGA ‘.txt’ ‘.cod’ The I/O Control logic Input signals of the bus master device ü The I/O control logic serves as the bus interface for master device and slave device. ü Clk – clock (shared with the bus master device) ü The RESA parallel bus is drawn within the I/O control logic since both the application and the Monitor Slave access the RESA bus via the I/O control logic. ü STEP_EN - This signal is a one clock cycle pulse that causes the master (e.g. a CPU) to perform one step. ü RESET - A reset signal. ü ACK_N - The acknowledge signal is sent by the slave in a bus transaction. This signal is active low. ü DO[31:0] - This is the data-out bus for the master and slave devices (both the bus master and the bus slave share these signals ). Output signals of the bus master device Input signals of the bus slave device ü AS_N - This is the addressstrobe signal. This signal is active low. ü Clk – clock (shared with the bus master device) ü MAO [31:0] - This is the address-out bus (no sharing with the slave device). ü MDO [31:0] - This is the data-out bus (no sharing with the slave device). ü WR_OUT_N - This is the R/W signal generated by the bus master. When high it indicates read operation and when low – write operation. ü IN_INIT - This is signal indicating the status of the master. When low the master is active and bus is inaccessible. When high, master is idle and bus can be accessed by Monitor program. ü DO [31:0] - This is the dataout bus (shared with the bus master device) AI [9:0] - This is the address-in bus of the slave device. There can be 1024 different addresses for devices in the FPGA. WR_IN_N - This is the write signal that is input to the slave device. This signal is active low. ü CARDSEL - This signal indicates that the slave of the current transaction is on the FPGA. It is computed by the I/O control logic from the 22 upper bits of the address in the RESA. Output signals of the bus slave device Handout #3: A simple slave device In this assignment you will get a trivial master device. Your purpose is to read values from this master. But, you can't initiate read transactions from the master device itself, so, you will design a slave device that can monitor the requested values and you will be using the monitor program to initiate read transactions from this slave device. PC M ü SDO [31:0]: Data-out bus. ü SACK_N: Ack signal generated by the slave device and mark SDO validity. This signal is active low. Master n Parallel Bus S Monitor slave Handout #3: A simple slave device This slave device is connected to nets of the master device by "private wires" (i.e. the private wires are not part of the bus). Master M M private wires n Parallel Bus S Main Memory The Master The master device we use for this assignment is a 32-bit binary counter connected to 32x32 bit RAM. The 5 LSB output bits of the counter (4:0) are used as RAM address bits, while the full 32 bits of the counter outputs as RAM data input. FPGA PC M FPGA S S Monitor slave Main Memory We will be using an extension of this mechanism to design applications (i.e. a DLX) with "built-in monitoring". This master in every step fills RAM cells with corresponding counter values, but does not initiate any bus transactions, and is therefore, a degenerated master device. Part of the master is a simple 5-bit counter that indicates the number of executed steps. The Master The Slave The slave device in this assignment reads one of the following: 1. The values stored in the 32x32 RAM (reg_out(31:0)) 2. The state(3:0) of broja 3. The value output by the counter (step_num(4:0)) 4. The writing address (reg_write(4:0)) 5. ID(7:0) constant - stores the "code" of your lab group The address space of the slave device is defined by four addresses: 0X0 MUX32bit 0X20 4X1 32 32 SDO 0X40 32 0X60 32 Slave Address Partitioning The Slave When the PC monitor program wishes to read the counter's value, it initiates a read bus transaction with the address of the counter's output. The slave device receives this request, and routes the counter's output to the SDO-bus. 0X0 32 MUX32bit 0X20 4X1 32 SDO 0X40 7 9 AI[9:0] 6 BA[2:0] 5 4 PA[1:0] 0 WA[4:0] BA[2:0] – chooses a block (1 out of 8) PA[1:0] – chooses a page out of 4 pages in a block. WA[4:0] – chooses a word out of 32 words in a page. 32 32 0X60 32 bits Word = 32 bit The slave device then acknowledges that the requested data has been sent, and the read transaction is completed. Slave Control SACK_N 0 8X Block 4 Pages Page 32 words 31 word Handout #3: Block Diagram Handout #3: Block Diagram Master I/O Logic CLK Step_en CLK Step_en Reg_out(31:0) Step_num(4:0) reset Reg_adr(4:0) State(3:0) Reg_write(4:0) Monitor Slave MUX32bit 4X1 reset 0X0 ID SDO 0X20 ID(7:0) SACK_N is given to you 0X40 SDO 0X60 AI[9:0] CARDSEL Slave Control AI[?:?] AI[?:?] WR_IN_N AI[?:?] AI CARDSEL WR_IN_N I/O control logic inputs and outputs Slave Control SACK_N Wave Forms AI[?:?] D CARDSEL WR_IN_N Q Q AI[?:?] SACK_N CLK D CARDSEL WR_IN_N Q Q D Q Q D Q Q Depending on the address CLK AS_N Derivation and Inversion CLK 0 or more – according to the slave Q_2 Q_1 An optional implementation of slave control: Q_1 Q_2 SACK_N D Q Q SACK_N D Q Q Hand Out #3 Design according to the specification: 1.Monitor Slave (the slave device). 2.Group’s ID component. Run simulations in order to check your design (The monitor Slave). Connect all the designed components in such a way that the following values could be read: 1. reg_out(31:0) 4. reg_write(4:0) 2. state(3:0) 5. ID(7:0) 3. step_num(4:0) For the next recitation Please read Chapter 4 of the lab notes: ü Chapter 4: Built-In Self Monitoring Hand Out #3 Implemented your design and produce a bit file. Create Configuration Labels. Run your implementation on the RESA computer by using the RESA monitor program. Print: Pay attention to Handout#2’s remarks. ü ü ü ü Design The simulations that you submit should be consistent with the Simulations protocol that we have learnt. Label report Data snapshots of three sequential steps Make sure that: Step_num[4:0] advances. Reg_write[4:0] advances. ID[7:0] is read. built-in monitoring Advanced Computer Structure Laboratory Chapter 4: Built-In Self Monitoring The Main Memory To enable such monitoring, our designs will include the hardware that will be responsible for reading various values in the DLX processor without changing the DLX behavior. Liron David On the last recitation we have encountered hardware that implements the required functionality (i.e Monitor Slave) - we will expand that hardware so it will provide us with more complicated services. General Structure The Application The FPGA contains three modules: 1. The Application (e.g. the DLX processor) 2. The Monitor Slave 3. The I/O control logic The PC monitor program Our goal is to built a standard debugging, means to be able to run the DLX processor step by step and monitor (i.e. view) the values of registers and control signals. PC – Monitor Program FPGA I/O Control Logic Application M M S S SDRAM Main Memory Our goal: designing a processor that executes DLX instructions. RESA Bus Monitor Slave The control of the DLX starts each instruction execution in the "fetch" state, passes through a few other states and returns to the "fetch" state. Fetch Decode Execute An instruction execution is an interval of clock cycles between two consecutive entries to the fetch state. Memory Write Back Monitoring Tasks Monitoring Tasks /step_en The DLX’s control FSM has an additional state - init step_en step_en Init We have two modes: /step_en Fetch Decode In single-step mode, each execution of an instruction waits until an appropriate signal arrives from the PC monitor program. Execute Memory In continuous mode, the execution of the next instruction is unconditional. Write Back The Logic Analyzer For debug purposes DLX is running in single step. Using the slave device we are reading values only when DLX is in “init" state. This does not suffice, for example: ü Monitoring the bus activity of the DLX (e.g. in_init). ü Monitoring internal signals of the application over instruction execution (e.g. reg_write[4:0]). Conclusion: reporting current values is not enough for debugging a design. The Logic Analyzer /step_en step_en Init The purpose of this handout: /step_en To design a Monitor Slave that can ü Monitor control signals during instruction execution. ü Monitor register values from the application. These values can be later reported to the PC monitor program. step_en Therefore, we should be able to do the following: Fetch Decode Execute 1. Store the monitored signals cycle by cycle during the execution of an instruction. 32X32 RAM CLK WE Memory Write Back Monitored Signals[31:0] 2. After the instruction's execution is completed, be prepared to answer bus read transactions in which the PC monitor program asks about the sampled values. The Logic Analyzer The Logic Analyzer The part of the Monitor Slave that stores past signals is called the Logic Analyzer. The Logic Analyzer's RAM. This is a 32 x 32-bit RAM that stores the sampled values from the application. In each clock cycle, up to 32 signals (i.e. bits) can be stored. The RAM is filled "row by row" with the sampled values until the execution of the current instruction ends. The Logic Analyzer The Logic Analyzer The Counter. The 5-bit counter generates the address into which sampled values are stored. After an execution of an application's instruction, the counter is reset by L.A’s logic. The counter's output value equals the number of cycles that have elapsed since the beginning of the last instruction. The Mux. The 2x5 bit mux enables Logic Analyzer's RAM to be addressed by the Counter, when sampling (storing) signals and by the Monitor slave, when reading from the RAM the stored values. Handout #4 Built-in Self Monitoring The Logic Analyzer The Status Register. Status Register latches the value of the Logic Analyzer's counter so that the Monitor Slave can report the number of rows that contain relevant data in the Logic Analyzer's RAM. Design and implement over the RESA computer: Monitor Slave - Including the Logic Analyzer The Design includes: 1. I/O Logic 2. The Master from Handout #3 3. Monitor: • The Slave from Handout #3 • ID Register from Handout #3 • Logical Analyzer Handout #4 Built-in Self Monitoring Monitored Signals[31:0] Waveforms – L.A Control Signals The information could be read from the Monitor Slave Only Here ! Monitor Slave LA_RAM[31:0] STATUS = B[7:0] L.A CLK ID_REG Input_1[31:0] Input_2[31:0] B A B C D Slave STEP_EN SDO[31:0] reg_adr[4:0] IN_INIT Fetch, Decode, Execute, Memory, Write Back WE SACK_N There may be additional input/output terminals Pay attention – one c.c, before & after Waveforms – L.A Control Signals Handout #4 Built-in Self Monitoring Recommended schedule: Next week: • Designing over the Xlinix platform. • Beginning simulations. The week after: • Finishing simulations. • Implementing over the RESA computer. • Monitoring the application using the PC Monitor. Handout #4 Built-in Self Monitoring Additional submission guidelines: 1. Submit a simulation showing: 1. Sampling process of the Logic Analyzer. 2. Reading process from the Monitor: 1. Logic Analyzer (The values saved in 1.1) 1. L.A’s RAM. 2. L.A’s Status/ID. 2. External Inputs. 3. Control signals. Handout #4 Built-in Self Monitoring 2. Using RESA monitor program, submit snapshots and graphical waveforms showing two step_en cycles: 1. Logic Analyzer’s RAM (The sampled signals): 1. in_init. 2. state[3:0]. 3. step_num[4:0]. 4. reg_write[4:0]. 2. Logic Analyzer’s Status + ID. 3. External Inputs: 1. Master’s RAM. 2. step_num[4:0]. 4. New Control Signals. Handout #4 Built-in Self Monitoring 3. Make sure that the sampled signals, indeed convince that your design is correct – please attach a proper documentation. For the next recitation Please read Chapter 5 of the lab notes: ü Chapter 5: A Read Machine and A Write Machine General Structure Advanced Computer Structure Laboratory Chapter 5: A Read Machine and A Write Machine Liron David The FPGA contains three modules: 1. The Application (e.g. the DLX processor) 2. The Monitor Slave 3. The I/O control logic The PC monitor program The Main Memory PC – Monitor Program FPGA I/O Control Logic Application M M S S SDRAM Main Memory RESA Bus Monitor Slave A Read Machine and A Write Machine A Read Machine This chapter deals with designing a bus master that is capable of initiating bus transactions. We consider two types of machines: a read machine and a write machine. The Read Machine is an application that reads the contents of memory (addressed by the counter) and stores the read value in a register. The Read Machine is connected as a bus master to the I/O Control Logic. A Read Machine - State Diagram The functionality of the Read Machine is as follows: 1. The machine exits the "wait" state when the STEP_EN signal is active. 2. The machine initiates a read transaction in the "fetch" state. 3. The machine waits for an ACK signal during the"wait4ack" state. 4. The machine writes the fetched value in its register when entering the "load" state and the counter's value is incremented by one. A Write Machine The Write Machine is an application that writes your favorite value(s) to the memory (addressed by the counter). The Write Machine is connected as a bus master to the I/O Control Logic. The reset signal causes the machine to transition to the "wait" state, regardless of its current state and resets the counter to it's initial value(0) A Write Machine - State Diagram The functionality of the Write Machine is as follows: 1. The machine exits the "wait" state when the STEP_EN signal is active. 2. The machine initiates a write transaction in the "store" state. 3. The machine waits for an ACK signal during the "wait4ack" state. 4. The counter's value is incremented by one in the “terminate“ state. The reset signal causes the machine to transition to the "wait" state, regardless of its current state and resets the counter to it's initial value(0) Read Machine VHDL State Machine VHDL counter (RAM Address) AO[31:0] 32 ce Register Din[31:0] RDO[31:0] 32 32 There are additional inputs/outputs Write Machine VHDL State Machine VHDL Constant Data VHDL counter (RAM Address) AO[31:0] 32 WDO[31:0] 32 VHDL - State Machine VHDL - State Machine Defining transfer function Defining constants that represents the states constant constant constant constant constant constant constant constant constant constant F0_STAY F1_STAY F2_STAY F3_STAY F1_UP F2_UP F3_UP F0_DOWN F1_DOWN F2_DOWN : : : : : : : : : : std_logic_vector(3 std_logic_vector(3 std_logic_vector(3 std_logic_vector(3 std_logic_vector(3 std_logic_vector(3 std_logic_vector(3 std_logic_vector(3 std_logic_vector(3 std_logic_vector(3 downto downto downto downto downto downto downto downto downto downto 0) 0) 0) 0) 0) 0) 0) 0) 0) 0) := := := := := := := := ;= := "0000"; "0001"; "0010"; "0011"; "0100"; "0101"; "0110"; "0111"; "1000"; "1001"; case state is when F0_STAY => if (f_in="00") then state <= F0_STAY; else state <= F1_UP; end if; when F0_DOWN => if (f_in="00") then state <= F0_STAY; else state <= F1_UP; end if; VHDL - State Machine "00" "01"/"10" F0_STAY F1_UP VHDL - State Machine Defining transfer function Defining output case state is ... when F3_UP => if (f_in="11") then state <= F3_STAY; else state <= F2_DOWN; end if; when others => null; end case; f_num <="00" when ((state = F0_STAY) or (state = F0_DOWN)) else "01" when ((state = F1_STAY) or (state = F1_DOWN) or (state = F1_UP)) else "10" when ((state = F2_STAY) or (state = F2_DOWN) or (state = F2_UP)) else "11"; move <= STAY when ((state = F0_STAY) or (state = F1_STAY) or (state = F2_STAY) or (state = F3_STAY)) else UP when ((state = F1_UP) or (state = F2_UP) or (state = F3_UP)) else DOWN; Submission Guidelines Waveforms – L.A Control Signals CLK One or more c.c STEP_EN The Post lab should be submitted in two different projects: 1. Write Machine 2. Read Machine There should be two simulations (two for each project) that show: ü ü ü ü ü IN_INIT AS_N, WR_N ACK_N Should be implemented STOP AO, DO valid Single c.c wait fetch wait4ACK terminate For the next recitation Please read Chapter 6 of the lab notes: ü Chapter 6: A Load/Store Machine wait All the control signals of the state machine. RDO[31:0], WDO[31:0] The Machine’s state. A full cycle of the Machine (i.e. 1st simulation). A reset disrupted cycle (i.e. 2nd simulation). RESA monitor program: ü In order to prove that your design (write & read machine) is successful, show a data snapshot before and after the writing & reading activities. ü To complete the proof, present an appropriate L.A wave forms. A Load/Store Machine Advanced Computer Structure Laboratory Chapter 6: A Load/Store Machine • This chapter focuses on designing memory accesses in the DLX design. • To focus on memory accesses, we will consider a primitive application, called the Load/Store Machine. • The Load/Store Machine executes DLX programs that consist only of simplified load and store instructions. Liron David Load/Store Instructions The instruction set of the Load/Store Machine consists only of load and store instructions. Load/Store Semantics lw RD R0 imm RD:=M(imm) sw RD R0 imm M(imm) :=RD Note that we allow the source register to be only R0. Recall that the value stored in Register R0 is always zero. Load/Store Instructions I-Format Load/store instructions are encoded in the I-Format. 6 5 5 16 Opcode RS1 RD Immediate Encoding Instruction IR[31 : 26] Load 100011 Store 101011 Memory Accesses Memory Accesses The Load/Store Machine accesses the main memory during "fetch", "load", and "store" states. How do we bridge the gap between 4 states for a memory access (Read/Write Machine) and a single state? The way this is done is by cascading state machines. busy step_en Init Communication between the Load/Store Machine and the I/O Control Logic is done via a state machine called the "Memory Access Control“ (MAC). The Memory Access Control resembles the Read and Write Machines. Fetch /busy /step_en Decode /reset Halt Load busy reset /busy Store /busy busy Write Back Memory Accesses - Waveforms Memory Accesses req - either mr or mw is active mr+mw mr - memory read active during the "fetch" and "load" states. mw - memory write - active during the "store" state. DLX Datapath busy – Read/Write transaction is being performed (/ack) * req CLK req ack busy wait4req wait4ack next wait4req The GPR environment The GPR environment Inputs: C, Aadr, Badr, Cadr, gpr_we It can support one of two operations in each cycle: Outputs: A,B 1. Write the value of input C in register R[Cadr] if gpr_we = 1. C C C 0 C Aadr 0 Badr Aadr Badr Cadr Cadr gpr_we 32 A gpr_we CLK B 32 A CLK B The GPR environment The GPR environment 2. Read the contents of the registers with indexes Aadr and Badr. The outputs A and B are defined by: For register debugging purposes we append third register D with the same functionality as A,B. RESA Monitor can read the contents of D register, addressing it with Dadr and reading output R[Dadr] thru the Monitor slave. C C Aadr 0 Badr C Cadr C 0 Aadr Badr Cadr 32 A B gpr_we CLK Dadr gpr_we 32 A B D CLK A schematic diagram of the GPR environment The Control of the Load/Store Machine The following figure depicts a state diagram of the control of the Load/Store Machine. d_in[31:0] CLK 5 5 d_in[31:0] CLK we we Add[4:0] d_out[31:0] Add[4:0] d_out[31:0] 1 1 32 CLK d_in[31:0] we Add[4:0] busy d_out[31:0] Init 1 32 step_en Fetch 32 v /busy /step_en Decode 5 Halt /reset 5 Load busy busy Store /busy /busy reset 32 Init Fetch 32 Wait for step enable Init Fetch Decode Halt Load Write Back Write Back 32 IR = M[PC] Decode Store Halt Load Store Write Back 6 IR Opcode 5 RS1 5 RD 16 Immediate Main Memory 0 PC DLX program 232-1 Init Fetch Decode Halt Init B = RD PC = PC+1 Load Fetch Decode Halt Store Load Determine the next state Write Back 6 IR Opcode 5 RS1 5 RD M(Imm) = B Store Write Back 16 6 C Immediate IR Opcode 5 RS1 5 RD 16 Immediate Main Memory Main Memory C C 0 32 0 Aadr Badr Cadr A gpr_we CLK B Init DLX program PC B 0 32 A Init C = M(Imm) DLX program PC B 232-1 Fetch Decode Halt gpr_we CLK B 232-1 Fetch 0 Aadr Badr Cadr RD = C Decode Load Halt Store Load Write Back Store Write Back 6 C IR Opcode 5 RS1 5 RD 16 6 C Immediate IR Opcode 5 RS1 5 RD 16 Immediate Main Memory Main Memory C C 0 32 0 Aadr Badr Cadr A B B gpr_we CLK PC 232-1 0 DLX program 32 0 Aadr Badr Cadr A B B gpr_we CLK PC 232-1 DLX program Init Fetch Decode Halt The Datapath of the Load/Store Machine Machine is stuck till reset Load The following Figure depicts a block diagram of the datapath of the Load/Store Machine. Store Buses are connected to the I/O control logic, as depicted. Write Back 6 IR Opcode 5 RS1 5 RD Control signals are omitted from this figure, and you are asked to decide which signals are needed and when they are active. 16 Immediate Main Memory C 0 32 0 Aadr Badr Cadr A B B gpr_we CLK PC DLX program 232-1 v v v v v Translating logical addresses to physical addresses Implementation Please output the state Load/Store Machine D_IN[31:0] In the Load/Store Machine, the logical addresses of the machine are limited to the maximum of 64 KWords (range: 0 to 0xFFFF), while the physical address space is 2 MWords. The address translate unit is simple block, that concatenates the 16 address bits of the Load/Store with 16 bit constant, usually zero, but not more than 0x1F. Control Signal Control DLX Control State Machine reset step_en I/O Control Logic clk mr Datapath mw MAC busy ack_n AO[31:0]] D_OUT[31:0] step_en In_init signals Monitor Slave Input A,B AS_N, WR_N busy Testing a finite state machine step_en Init Fetch /busy /step_en Decode The goal is to test if all the transitions of the finite state machine are correct. This can be done by "covering" all the transitions of the control by paths (starting in the initial state). /reset Halt busy Load /busy reset busy /busy Store Write Back For each path, one needs to compute input values that will cause the control to traverse the path. busy step_en Init The technique of performing simulation with a given sequence of inputs and the expected output sequence is called test vectors. Fetch /busy /step_en Decode /reset Halt busy Load /busy reset Write Back busy Store /busy busy step_en Init A simulation environment Fetch /busy /step_en Decode /reset Halt busy Load /busy reset busy Store /busy Write Back Check if indeed the reset signal initializes the control, and if the step enable signal causes a transition to the "fetch" state. A simulation environment The IO_SIMUL Module encapsulates the simplified functionality of I/O Control Logic, the RESA bus and the memory. By combining your design with the IO-SIMUL Module, you can simulate your circuit as if it is connected to the RESA bus. The Load/Store Machine interacts with other devices through the RESA bus, therefor it is not a trivial task to generate manually the signals fed to the Load/Store Machine by the RESA bus. To enable a simulation environment in which you do not need to determine the values of the RESA bus signals, a module called IO-SIMUL was designed. Testing ü Testing of RTL instructions (replacing I/O Logic with the I/O SIMUL – Simulation). ü Testing executions of whole instructions (Simulation). ü Testing executions of whole instructions (replacing I/O SIMUL with the I/O Logic, and implementing the design). For the next recitation Please read Chapter 7 of the lab notes: ü Chapter 7: A simplified DLX A simplified DLX Advanced Computer Structure Laboratory In this recitation we describe a simplified DLX-Architecture which you will be implementing on the RESA-2. Chapter 7: A simplified DLX Liron David Instruction Formats Instruction Formats There are two instruction formats: There are two instruction formats: 1. An instruction in the I-Type-Format is divided into four fields depicted below. 2. An instruction in the R-Type-Format is divided into five fields depicted below. 6 IR Opcode 5 RS1 5 RD 16 Immediate IR[31:26] holds operation’s encoding 6 IR Opcode IR[31:26] = 06 5 5 5 5 RS1 RS2 RD X 6 Function IR[5:0] holds operation’s encoding Instruction Set Instruction Set I-Type We list below the instruction set of the simplified DLX. ü imm denotes the value of the immediate field in an I-TypeInstruction. ü sext(imm) denotes the 2's complement sign extension of imm to 32 bits. ü The architectural registers of the simplified DLX are all 32 bits wide. 6 Opcode 5 RS1 5 RD 16 Immediate I-Type I-Type 6 Opcode 5 RS1 5 RD 16 Immediate 6 Opcode 5 RS1 5 RD 16 Immediate Encoding of the Instruction Set R-Type 6 Opcode 5 5 5 5 RS1 RS2 RD X 6 Function Encoding of the Instruction Set Implementation The Datapath of the simplified DLX Architectural Registers The architectural registers of the simplified DLX are all 32 bits wide: • 32 General Purpose Registers (GPR): Ro to R31. Note that R0 always holds the value 0; • Program Counter (PC); • Instruction Register (IR); ALU environment The ALU supports: • 2's complement integer addition and subtraction. • Bitwise logical instructions. • Comparison instructions. • Special Registers: MAR, MDR, A, B and C; Shifter environment The GPR environment The shifter is a 32-bit left/right logical shifter, means that a zero is pushed in from the right (left) in case of a left (right) shift. The GPR environment is identical to that of the Load/Store Machine. The control inputs are: shift - indicates whether a shift should take place (otherwise the output equals the input). right - indicates whether the shift is a right shift. Control Access to the memory is done via the Memory Access Control module as described for the Load/Store Machine. The reset signal causes a transition in the control of the DLX to "init" state. Control step_en – from the I/O control logic. busy – from the memory access control. Control D1. . D12 - corresponding to the decoding of the instructions. else - corresponds to an illegal instruction Control bt (branch taken) corresponds to the event that the condition of a conditional branch is satisfied. Control The control signals The control signals are used to communicate between the Datapath and the Control. The active control signals in each state Examples - lw RD RS1 imm lw RD RS1 imm jarl RS1 jarl RS1 sub RD RS1 RS2 sub RD RS1 RS2 beqz RD RS1 imm beqz RD RS1 imm Hand Out #7 Advanced Computer Structure Laboratory Chapter 7: A simplified DLX – Part II • This Handout includes four sections. • Recommended schedule and guidelines are in the Handout. • The weight of this handout is 400 points. • Questions 2 and 4 requires the approval of the lab’s engineer (the form located in the website), please attach it to the submitted project. • Please attach to your project the timing report produced by Xlinix which verifies that your design meets the timing requirements. • The Programming assignment will be published in the following weeks. copyrights © Moti Medina Building blocks Building blocks • The GPR environment • The IR environment – Inputs: IR_CE, CLK. – Outputs: IR_OUT[31:0], sext(imm). – The GPR environment is identical to that of the Load/Store Machine. – Please implement AEQZ. – A CADR selection mechanism should be implemented. • Sext(imm[15:0]) = imm[15]16 imm[15:0] • The PC environment – 32 bit Register with a RESET port. copyrights © Moti Medina copyrights © Moti Medina Building blocks Building blocks • The MMU • ALU environment – Input: AO[31:0]. – Output: 08•AO[23:0] – Inputs: A[31:0], B[31:0], ALUF[2:0], TEST, ADD. – Outputs: ALU_OUT[31:0], NEG. – Reminder #1: Let A[n-1:0], B[n-1:0] {0,1}n , Denote [·] the two’s comp’ representation, So: [A[n-1:0]] - [B[n-1:0]] = [A[n-1:0]] + [¬B[n-1:0]] + 1 – Reminder #2: B' copyrights © Moti Medina Building blocks n ; B ' = XOR( B, ADD ) copyrights © Moti Medina Building blocks – we suggest that you use three 16-bit adder/subtractors from the Xilinx library (ADSU16) to build a 32-bit Conditional Sum Adder. • ALU environment – cont. – Use the ADDSUB16 component in the following way: – Reminder #3: “Computer Structure Lecture Notes” By Dr. Guy Even. • Chapter 8 – Addition (8.4 Conditional Sum Adder) • Chapter 10 – Signed Addition – Reminder #4: “Computer Architecture - Complexity and Correctness” By S.M.Müller and W.J.Paul . copyrights © Moti Medina copyrights © Moti Medina ALU: Tasks performed in the control states State Operation Decode Alu add op, op=add/sub/and/or/xor. add rel, rel=lt, eq, gt, le, ge, ne. add add add add add AluI TestI ALU Adr. Comp. B.Taken JR SavePC JALR B[31 : 0] ALU: Control Signals A[31 : 0] ovf • add (active during states: Decode, AluI, Adr.Comp., B.Taken,SavePC, JR, JALR). • test (active during states: TestI). add 010 sub 110 and 101 or 100 xor IR[2:0] = func[2:0] ADD - SUB(32) sub OR(32) XOR(32) neg 001 gt 010 eq 011 ge 100 lt 101 ne 110 le F [0] S [31 : 0] Comparator(32) 1 F [1] 0 MUX(32) F [2 : 0] F [2] 0 1 MUX(32) COMP_ OUT Next slide ALU: Implementation 011 ALUF [2 : 0] 031 · COMP _ OUT 32 IR[28:26] = opcode[2:0] F [0] 0 1 MUX(32) AND(32) INV 32 ALUF[2:0] – test conditions B[31 : 0] A[31 : 0] OR • ALUF[2:0] 011 B[31 : 0] A[31 : 0] test Signals that control the functionality of the ALU: ALUF[2:0] – arithmetic / logical ALU operations B[31 : 0] A[31 : 0] 32 1 0 MUX(32) ALU - OUT [31 : 0] 1 test 0 MUX(32) F [2 : 0] add S [31 : 0] “Register B” 32 The instructions in which register B is loaded: ZERO(32) • add neg Comparator • sub F [0] F [2] F [1] • and • or INV AND AND AND • xor •store INV AND OR ALU: Implementation OR (cont’) COMP_ OUT Good Luck Register B is not involved in computations during instructions in which it need not be loaded. Therefore, functionality is correct. Loading register B always (during Decode state), shortens the length of the path in the Control State Machine when executing instructions that need register B loaded. Advanced Computer Structure Laboratory DLX Recitation Problem #1:Convert to DLX’s Assembly xor if(i==j) goto L1; f=g+h; L1: f=f-i; r1 r19 r20 beqz r1 1 add r16 r17 r18 sub r16 r16 r19 LEGEND: r16 = f r17 = g r18 = h r19 = i r20 = j r21 = k Problem #2:Convert to DLX’s Assembly LOOP: g=g + A[i] ; addi r4 r0 Astart add r1 r4 r19 lw r2 r1 0 add r17 r17 r2 i=i+j; add r19 r19 r20 if((i!=h) goto LOOP; xor r3 r19 r18 bnez r3 -6 Address of A[0] = Astart מועד א' תשס"ד שאלה מספר 4 ברצוננו להוסיף לשפת המכונה של ה – Simplified DLX פקודת I-typeחדשה: Chkbit17 RD RS1 imm פקודה זו גורמת לעידכון RDבאופן הבא: if R S 1[1 7 ] = 1 o th e r w is e 31 ì 0 ×1 RD = í 32 î 0 הציעו מימוש של ה DLX-שתומך בפקודה החדשה תוך ביצוע שינויים קטנים ככל האפשר במסלול הנתונים )הניקוד יופחת עבור שינויים מוגזמים(. .1מנו את השינויים הנדרשים במסלול הנתונים על מנת לתמוך בהרצת הפקודה החדשה. .2הציעו הרחבה לדיאגרמת המצבים של הבקרה על מנת לתמוך בהרצת הפקודה החדשה .ציירו את מסלול מצבי הבקרה ,שמכונת המצבים חולפת דרכו בעת הרצת הפקודה החדשה .לכל מצב לאורך מסלול זה )חדש וישן( ,תארו את פקודת ה – RTLשמתבצעת בו ואת אותות הבקרה הפעילים. Chkbit17 RD RS1 imm 31 ì 0 ×1 RD = í 32 î 0 if R S 1[1 7 ] = 1 o th e r w is e תשובה: chk17 Chk17mux 1 0 (4.1נוסיף )מסומן באדום(: mux -Zero Paddingמשרשר 31אפסים משמאל. נמשוך את הביט ה 17מרגיסטר .A Zero padding ]A[17 (4.2נוסיף מצב נוסף – .CHK17 מסלול ביצוע הפקודה מסומן בכחול )If OPCODE = OPCODE(chkbit17 CHK17 (4.2שימו לב שרשומים אותות הבקרה הפעילים, כך שאין פגיעה/שינוי במצבים אחרים. נוסיף את מצב ) CHK17בכחול( ונסיים. Chk17,Cce ]C = 031 × A[17 CHK17