The RESA's bus

advertisement
Course Website:
http://www.eng.tau.ac.il/~marko
Advanced Computer Structure
Laboratory
Chapter 1: Orientation
Liron David
Course’s Staff
• Prof. Guy Even
guy@eng.tau.ac.il
• Lab Assistant: Marko Markov
marko@eng.tau.ac.il
• Recitations: Liron David
lirondavid@gmail.com
The RESA computer
Work flow
The RESA computer is based on XSA-3s1000 board that
contains:
1. FPGA (Field Programmable Gate Array) with 1000K gates
2. 32 MB of SDRAM
3. a direct interface to a PC
1. Design Entry - editing the design using the software. Use a
top-down approach. Add remarks to your designs.
2. Simulation - testing your design. Find mistakes in
connections of wires, wrong polarities, incompatible
names of signals, etc.
The FPGA is a device that can be programmed while it is on
the board.
3. Implementation - a configuration file (called a '.bit' file) is
created from your design. This stage is fully automatic.
The lab’s goal is to design and implement a simplified DLX
CPU on the FPGA, while the SDRAM will serve as the main
memory of this CPU.
4. Running your design on the RESA.
Schedule
Handout #
Recitation Subject
Week No.
1
Orientation
1
2
The RESA’s Bus
1
3
Bus Slave (with I/O control logic)
2
4
Monitor slave
3
4
Monitor slave Cont.
4
5
Read/Write machine
5
6
Load/Store machine
6
6
Test Load/Store machine
7
7
DLX – Design
8
7
DLX – Simulation + Implementation
9
7
DLX – Testing + Timing
10
7
DLX - Program
11
Report’s Submission
ü The reports should be submitted via e-mail as one file in
pdf format with the group number (without names and
i.ds).
ü Every answer/design should be followed by an
explanation.
ü Explain the main signals’ behavior on waveforms, data
snapshot and test vectors . See an example on the next
page.
ü On the purple labeled weeks, a recitation will be held.
ü Recommend reading material: the matching chapter of
the lab notes.
ü Every group contains 2-3 students.
ü Pay attention to the remarks on your returned handout
2’s reports.
Reports’ Submission
For the next recitation
Report
Due to
Pre-Lab Reports
Submitting via mail
before the relevant lab
40%
Submitting via mail on
the next lab
60%
Post-Lab Reports
Grading
Please read Chapter 2 of the lab notes:
ü Chapter 2: The RESA’s Busses
ü Reports have to be submitted on time. For personal
problems – contact me ahead. Unpermitted delays will
reduce points.
The RESA’s bus
In order to enable communication between multiple devices , we
connect all them to the same wires called a bus.
Advanced Computer Structure
Laboratory
bus
n
Chapter 2: The RESA’s Busses
Liron David
The RESA’s bus
The bus protocol of the RESA computer is a synchronous
protocol. This means that there is a global clock signal, which is
present in all the devices that are connected to the bus. Every
transition of a signal in the bus is synchronized with an edge of
the clock.
bus
CLK
n
The communication over the bus is determined by a protocol.
The protocol coordinates the usage of the common resource,
namely, the bus.
The RESA’s bus
The communication via the bus takes place in chunks called
transactions - which are the basic "unit of communication".
In each transaction, one word of data is transmitted.
The party, which initiates the transaction is called the master
and the party which is asked to respond is called the slave.
There are two types of transactions:
1. a write transaction is a transaction in which the master
wants to send a value to the slave.
2. a read transaction is a transaction in which the master wants
to receive a value from the slave.
The RESA Computer
The RESA computer is based on
XSA-3s1000 board that contains:
1. FPGA (Field Programmable
Gate Array) with 1000K gates
2. 32 MB of SDRAM
3. a direct interface to a PC
The RESA’s bus
RESA has two buses with their own protocol:
Serial
Protocol
Interface
(PC)
FPGA
The FPGA is a device that can be
programmed while it is on the
board.
One, the external serial bus uses
parallel port connection between the
PC and the XSA board.
The protocol describes how data is
transferred between the hardware
interface (PC) and Serial Protocol
Interface (XSA board).
main
memory
(SDRAM)
The RESA’s bus
The RESA’s bus
RESA has two buses with their own protocol:
The second, internal parallel, is placed
on the XSA board and connects:
2 masters: Serial Protocol Interface
(PC), CPU (DLX or other master
designed by student)
and 2 slaves: main memory(SDRAM),
monitor slave with logic
analyzer(designed by student).
main
memory
(SDRAM)
Serial
Protocol
Interface
(PC)
General Structure of the RESA:
FPGA
PC
DLX CPU
M
M
FPGA
n
Parallel Bus
S
S
Monitor
slave
Main Memory
The RESA’s bus
The parallel RESA’s bus
The following signals are transmitted along the bus:
IN_INIT: This signal indicates the status of the master. When low
the master is active and bus is inaccessible. When high, master
is idle and bus can be accessed by the PC.
Address Strobe (AS_N): The beginning of a transaction is
signaled by a master device by asserting the AS_N signal. This
tells to all the slaves that, in the next clock cycle, information
will be sent on the bus.
The parallel RESA’s bus
The parallel RESA’s bus
Address (A[31:0]): The address signals are used to transmit the
address of the requested data item.
Note that the address of a data item holds the slave's device
name as well as the item's name within the slave.
For this purpose, some of the address bits are used to address
the slave device, the rest are used to address the data item
within the device.
31
A[31:0] :
0
The address of the slave
device
The address of the data
item within the device
Data (D[31:0]): The data signals are used to transmit the
requested data item which is being transmitted in a transaction.
In a write transaction, the master asserts the Data signals, and
in a read transaction the slave asserts the Data signals.
Write (WR_N): The WR_N signal is asserted by the master to
indicate whether the transaction is a read transaction or a write
transaction.
Acknowledge (ACK_N): The slave acknowledges the
transmission of the data item by asserting the ACK_N signal. In a
write transaction, the slave signals that the data will be read
and processed by it in the next clock cycle. In a read
transaction, the slave signals that the requested data item will
be transmitted in the next clock cycle.
in the next clock cycle,
information will be
sent on the bus.
A Read Transaction
the master is
active and bus
is inaccessible
in the next clock cycle,
information will be
sent on the bus.
A Write Transaction
It is a read
transaction
the address of
the requested
data item.
the requested data item which is
being transmitted in a transaction
the requested
data item will
be transmitted
in the next
clock cycle.
Handout #2: The RESA’s parallel bus
It is a write
transaction
the requested
address write to
the requested data item to write
Bus
Bus
Interface
The written data
will be read and
processed by the
slave in the next
clock cycle
Block Diagram - The RESA’s parallel bus
Consider a CPU that wants to communicate over the RESA bus
as a master device. The CPU is connected to the RESA bus via a
simple bus interface. The simple bus interface is placed on the
FPGA between the CPU and to the RESA bus.
CPU
the master is
active and bus
is inaccessible
Slave
These are given to you
Block Diagram - The Bus Interface
Block Diagram - The Bus Interface
Communication between the CPU and the bus interface
is implemented by 3 registers:
CLK
Required Control Signals:
In_init, DONE, AS_N, WR_N, CE_AD, CE_DO, CE_DI, DE_A, DE_D
Missing Blocks:
Solving modules (using buffer) , Additional Logic, FF’s
DE_A
CE_AD
ACK_N
wr_req
rd_req
R_AD
busy
D[31:0]
CE_DO
AO[31:0]
DO[31:0]
AS_N
WR_N
R_DO
CE_DI
DI[31:0]
In_init
DONE
Given Control Signals:
wr_req, rd_req, busy, ACK_N
A[31:0]
DE_D
R_DI
Given Control
Signals
Required Control
Signals
Missing Blocks
Waveforms – CPU ‘read’ instruction
Waveforms – CPU ‘write’ instruction
Waveforms – CPU ‘read after write’ instruction
For the next recitation
Please read Chapter 3 of the lab notes:
ü Chapter 3: A Simple Slave Device
The RESA Architecture
Advanced Computer Structure
Laboratory
Chapter 3: A Simple Slave Device
Liron David
The FPGA is a device that can
be programmed while it is on
the board.
The RESA Architecture
The RESA Monitor Program
A CPLD XC9572XL –
serve as the interface
between the PC parallel
port and the FPGA.
The RESA computer is based
on XSA-3s1000 board that
contains:
1. FPGA (Field
Programmable Gate
Array) with 1000K gates
2. 32 MB of SDRAM
3. a direct interface to a PC
An 32MB Memory –
Read-write memory, serve
as the main memory of
the application
The RESA program is a suite of programs, which is used to diagnose
and setup the RESA. The parts of the RESA program, which you are
going to use are:
-
Other parts of
XSA-3s1000 –
such as flash
memory,
programmable
oscillator, LED
indicator, VGA
and PS2 ports.
A Spartan III XC3S1000 FPGA –
A 1000K gates which can be programmed.
ü Configure RESA FPGA.
ü Use XLINIX to make a
‘.bit’ file.
ü Use RESA to
program the FPGA
from the ‘.bit’ file.
‘.bit’
The RESA Monitor Program
ü Access RESA memory.
ü read blocks of memory
after an application
(e.g. a CPU) completes
its execution.
ü Run and Debug
(a) Upload programs.
(b)
-
ü write (upload)
programs in the
application's language
in the memory (i.e.
DLX's assembly
program).
ü Run and Debug
The monitor enables
one to initiate read
and write transactions
to single RAM and
Slave addresses.
This is a very powerful
tool when we use a
concept called "built-in
monitoring". (we will
see it later).
Set single-step mode
or continuous mode.
-
‘.cod’
The RESA Monitor Program
c)
The RESA Monitor Program
‘.cod’
The RESA Monitor Program
ü Write programs in
assembly language
-
ü Compile your programs
Software-hardware
communication
protocol
-
ü Use the simulator to
test your program.
ü Generate Graphs.
ü Hardware interface
communicate with
FPGA
‘.txt’
‘.cod’
The I/O Control logic
Input signals of the bus master device
ü The I/O control logic serves as the bus interface for master
device and slave device.
ü Clk – clock (shared with the
bus master device)
ü The RESA parallel bus is drawn within the I/O control logic since
both the application and the Monitor Slave access the RESA bus
via the I/O control logic.
ü STEP_EN - This signal is a
one clock cycle pulse that
causes the master (e.g. a
CPU) to perform one step.
ü RESET - A reset signal.
ü ACK_N - The acknowledge signal is sent by the slave in a bus
transaction. This signal is active low.
ü DO[31:0] - This is the data-out bus for the master and slave devices
(both the bus master and the bus slave share these signals ).
Output signals of the bus master device
Input signals of the bus slave device
ü AS_N - This is the addressstrobe signal. This signal is
active low.
ü Clk – clock (shared with the
bus master device)
ü MAO [31:0] - This is the
address-out bus (no sharing
with the slave device).
ü MDO [31:0] - This is the
data-out bus (no sharing
with the slave device).
ü WR_OUT_N - This is the R/W signal generated by the bus master. When
high it indicates read operation and when low – write operation.
ü IN_INIT - This is signal indicating the status of the master. When low the
master is active and bus is inaccessible. When high, master is idle and
bus can be accessed by Monitor program.
ü DO [31:0] - This is the dataout bus (shared with the bus
master device)
AI [9:0] - This is the address-in bus of the slave
device. There can be 1024 different addresses
for devices in the FPGA.
WR_IN_N - This is the write signal that is input
to the slave device. This signal is active low.
ü CARDSEL - This signal
indicates that the slave of
the current transaction is on
the FPGA. It is computed by
the I/O control logic from the
22 upper bits of the address
in the RESA.
Output signals of the bus slave device
Handout #3: A simple slave device
In this assignment you will get a trivial master device.
Your purpose is to read values from this master.
But, you can't initiate read transactions from the master device
itself, so, you will design a slave device that can monitor the
requested values and you will be using the monitor program to
initiate read transactions from this slave device.
PC
M
ü SDO [31:0]: Data-out bus.
ü SACK_N: Ack signal generated by the slave device and mark SDO validity.
This signal is active low.
Master
n
Parallel Bus
S
Monitor
slave
Handout #3: A simple slave device
This slave device is connected to nets of the master device by
"private wires" (i.e. the private wires are not part of the bus).
Master
M
M
private wires
n
Parallel Bus
S
Main Memory
The Master
The master device we use for this assignment is a 32-bit binary
counter connected to 32x32 bit RAM.
The 5 LSB output bits of the counter (4:0) are used as RAM
address bits, while the full 32 bits of the counter outputs as
RAM data input.
FPGA
PC
M
FPGA
S
S
Monitor
slave
Main Memory
We will be using an extension of this mechanism to design
applications (i.e. a DLX) with "built-in monitoring".
This master in every step fills RAM cells with corresponding
counter values, but does not initiate any bus transactions, and is
therefore, a degenerated master device.
Part of the master is a simple 5-bit counter that indicates the
number of executed steps.
The Master
The Slave
The slave device in this assignment reads one of the following:
1. The values stored in the 32x32 RAM (reg_out(31:0))
2. The state(3:0) of broja
3. The value output by the counter (step_num(4:0))
4. The writing address (reg_write(4:0))
5. ID(7:0) constant - stores the "code" of your lab group
The address space of the slave device is defined by four addresses:
0X0
MUX32bit
0X20 4X1
32
32
SDO
0X40
32
0X60
32
Slave Address Partitioning
The Slave
When the PC monitor program wishes to read the counter's value,
it initiates a read bus transaction with the address of the counter's
output. The slave device receives this request, and routes the
counter's output to the SDO-bus.
0X0
32
MUX32bit
0X20 4X1
32
SDO
0X40
7
9
AI[9:0]
6
BA[2:0]
5
4
PA[1:0]
0
WA[4:0]
BA[2:0] – chooses a block (1 out of 8)
PA[1:0] – chooses a page out of 4 pages in a block.
WA[4:0] – chooses a word out of 32 words in a page.
32
32
0X60
32 bits
Word = 32 bit
The slave device then acknowledges that the requested data has
been sent, and the read transaction is completed.
Slave Control
SACK_N
0
8X
Block
4 Pages
Page
32 words
31
word
Handout #3: Block Diagram
Handout #3: Block Diagram
Master
I/O
Logic
CLK
Step_en
CLK
Step_en
Reg_out(31:0)
Step_num(4:0)
reset
Reg_adr(4:0)
State(3:0)
Reg_write(4:0)
Monitor Slave
MUX32bit
4X1
reset
0X0
ID
SDO
0X20
ID(7:0)
SACK_N
is given to you
0X40
SDO
0X60
AI[9:0]
CARDSEL
Slave Control
AI[?:?]
AI[?:?]
WR_IN_N
AI[?:?]
AI
CARDSEL
WR_IN_N
I/O control logic
inputs and
outputs
Slave Control
SACK_N
Wave Forms
AI[?:?]
D
CARDSEL
WR_IN_N
Q
Q
AI[?:?]
SACK_N
CLK
D
CARDSEL
WR_IN_N
Q
Q
D
Q
Q
D
Q
Q
Depending
on the
address
CLK
AS_N
Derivation
and
Inversion
CLK
0 or more –
according to
the slave
Q_2
Q_1
An optional implementation of slave control:
Q_1
Q_2
SACK_N
D
Q
Q
SACK_N
D
Q
Q
Hand Out #3
Design according to the specification:
1.Monitor Slave (the slave device).
2.Group’s ID component.
Run simulations in order to check your design (The monitor
Slave).
Connect all the designed components in such a way that the
following values could be read:
1. reg_out(31:0)
4. reg_write(4:0)
2. state(3:0)
5. ID(7:0)
3. step_num(4:0)
For the next recitation
Please read Chapter 4 of the lab notes:
ü Chapter 4: Built-In Self Monitoring
Hand Out #3
Implemented your design and produce a bit file.
Create Configuration Labels.
Run your implementation on the RESA computer by using the
RESA monitor program.
Print:
Pay attention to Handout#2’s remarks.
ü
ü
ü
ü
Design
The simulations that you submit
should be consistent with the
Simulations
protocol that we have learnt.
Label report
Data snapshots of three sequential steps
Make sure that: Step_num[4:0] advances.
Reg_write[4:0] advances.
ID[7:0] is read.
built-in monitoring
Advanced Computer Structure
Laboratory
Chapter 4: Built-In Self Monitoring
The Main Memory
To enable such monitoring, our designs will include the
hardware that will be responsible for reading various values
in the DLX processor without changing the DLX behavior.
Liron David
On the last recitation we have encountered hardware that
implements the required functionality (i.e Monitor Slave) - we
will expand that hardware so it will provide us with more
complicated services.
General Structure
The Application
The FPGA contains three modules:
1. The Application (e.g. the DLX processor)
2. The Monitor Slave
3. The I/O control logic
The PC monitor
program
Our goal is to built a standard debugging, means to be able to
run the DLX processor step by step and monitor (i.e. view) the
values of registers and control signals.
PC –
Monitor Program
FPGA
I/O Control Logic
Application
M
M
S
S
SDRAM Main Memory
Our goal: designing a processor that executes DLX instructions.
RESA Bus
Monitor
Slave
The control of the DLX starts each instruction
execution in the "fetch" state, passes through
a few other states and returns to the "fetch"
state.
Fetch
Decode
Execute
An instruction execution is an interval of
clock cycles between two consecutive entries
to the fetch state.
Memory
Write Back
Monitoring Tasks
Monitoring Tasks
/step_en
The DLX’s control FSM has an
additional state - init
step_en
step_en
Init
We have two modes:
/step_en
Fetch
Decode
In single-step mode, each execution of an
instruction waits until an appropriate signal
arrives from the PC monitor program.
Execute
Memory
In continuous mode, the execution of the
next instruction is unconditional.
Write Back
The Logic Analyzer
For debug purposes DLX is running in single step. Using the
slave device we are reading values only when DLX is in “init"
state.
This does not suffice, for example:
ü Monitoring the bus activity of the DLX (e.g. in_init).
ü Monitoring internal signals of the application over
instruction execution (e.g. reg_write[4:0]).
Conclusion: reporting current values is not enough for
debugging a design.
The Logic Analyzer
/step_en
step_en
Init
The purpose of this handout:
/step_en
To design a Monitor Slave that can
ü Monitor control signals during
instruction execution.
ü Monitor register values from the
application.
These values can be later reported to
the PC monitor program.
step_en
Therefore, we should be able to do the following:
Fetch
Decode
Execute
1. Store the monitored signals cycle by cycle during the
execution of an instruction.
32X32 RAM
CLK
WE
Memory
Write Back
Monitored Signals[31:0]
2. After the instruction's execution is completed, be
prepared to answer bus read transactions in which the PC
monitor program asks about the sampled values.
The Logic Analyzer
The Logic Analyzer
The part of the Monitor Slave that stores past signals is called
the Logic Analyzer.
The Logic Analyzer's RAM. This is a 32 x 32-bit RAM that stores the sampled
values from the application. In each clock cycle, up to 32 signals (i.e. bits) can
be stored.
The RAM is filled "row by row" with the sampled values until the execution of
the current instruction ends.
The Logic Analyzer
The Logic Analyzer
The Counter. The 5-bit counter generates the address into which sampled
values are stored. After an execution of an application's instruction, the
counter is reset by L.A’s logic. The counter's output value equals the number
of cycles that have elapsed since the beginning of the last instruction.
The Mux. The 2x5 bit mux enables Logic Analyzer's RAM to be addressed by
the Counter, when sampling (storing) signals and by the Monitor slave, when
reading from the RAM the stored values.
Handout #4 Built-in Self Monitoring
The Logic Analyzer
The Status Register. Status Register latches the value of the Logic Analyzer's
counter so that the Monitor Slave can report the number of rows that
contain relevant data in the Logic Analyzer's RAM.
Design and implement over the RESA computer:
Monitor Slave - Including the Logic Analyzer
The Design includes:
1. I/O Logic
2. The Master from Handout #3
3. Monitor:
• The Slave from Handout #3
• ID Register from Handout #3
• Logical Analyzer
Handout #4 Built-in Self Monitoring
Monitored
Signals[31:0]
Waveforms – L.A Control Signals
The information could be
read from the Monitor Slave
Only Here !
Monitor Slave
LA_RAM[31:0]
STATUS = B[7:0]
L.A
CLK
ID_REG
Input_1[31:0]
Input_2[31:0]
B
A
B
C
D
Slave
STEP_EN
SDO[31:0]
reg_adr[4:0]
IN_INIT
Fetch, Decode, Execute,
Memory, Write Back
WE
SACK_N
There may be additional
input/output terminals
Pay attention –
one c.c, before & after
Waveforms – L.A Control Signals
Handout #4 Built-in Self Monitoring
Recommended schedule:
Next week:
• Designing over the Xlinix platform.
• Beginning simulations.
The week after:
• Finishing simulations.
• Implementing over the RESA computer.
• Monitoring the application using the PC Monitor.
Handout #4 Built-in Self Monitoring
Additional submission guidelines:
1. Submit a simulation showing:
1. Sampling process of the Logic Analyzer.
2. Reading process from the Monitor:
1. Logic Analyzer (The values saved in 1.1)
1. L.A’s RAM.
2. L.A’s Status/ID.
2. External Inputs.
3. Control signals.
Handout #4 Built-in Self Monitoring
2. Using RESA monitor program, submit snapshots and
graphical waveforms showing two step_en cycles:
1. Logic Analyzer’s RAM (The sampled signals):
1. in_init.
2. state[3:0].
3. step_num[4:0].
4. reg_write[4:0].
2. Logic Analyzer’s Status + ID.
3. External Inputs:
1. Master’s RAM.
2. step_num[4:0].
4. New Control Signals.
Handout #4 Built-in Self Monitoring
3. Make sure that the sampled signals, indeed convince that
your design is correct – please attach a proper
documentation.
For the next recitation
Please read Chapter 5 of the lab notes:
ü Chapter 5: A Read Machine and A Write Machine
General Structure
Advanced Computer Structure
Laboratory
Chapter 5: A Read Machine and
A Write Machine
Liron David
The FPGA contains three modules:
1. The Application (e.g. the DLX processor)
2. The Monitor Slave
3. The I/O control logic
The PC monitor
program
The Main Memory
PC –
Monitor Program
FPGA
I/O Control Logic
Application
M
M
S
S
SDRAM Main Memory
RESA Bus
Monitor
Slave
A Read Machine and A Write Machine
A Read Machine
This chapter deals with designing a bus master that is capable
of initiating bus transactions.
We consider two types of machines: a read machine and a
write machine.
The Read Machine is an application that reads the contents of
memory (addressed by the counter) and stores the read value
in a register.
The Read Machine is connected as a bus master to the I/O
Control Logic.
A Read Machine - State Diagram
The functionality of the Read Machine is as follows:
1. The machine exits the "wait" state when the STEP_EN signal is
active.
2. The machine initiates a read transaction in the "fetch" state.
3. The machine waits for an ACK signal during the"wait4ack" state.
4. The machine writes the fetched value in its register when entering
the "load" state and the counter's value is incremented by one.
A Write Machine
The Write Machine is an application that writes your favorite
value(s) to the memory (addressed by the counter).
The Write Machine is connected as a bus master to the I/O
Control Logic.
The reset signal causes the machine to transition to the "wait" state,
regardless of its current state and resets the counter to it's initial value(0)
A Write Machine - State Diagram
The functionality of the Write Machine is as follows:
1. The machine exits the "wait" state when the STEP_EN signal is
active.
2. The machine initiates a write transaction in the "store" state.
3. The machine waits for an ACK signal during the "wait4ack" state.
4. The counter's value is incremented by one in the “terminate“ state.
The reset signal causes the machine to transition to the "wait" state,
regardless of its current state and resets the counter to it's initial value(0)
Read Machine
VHDL
State Machine
VHDL counter
(RAM Address)
AO[31:0]
32
ce
Register
Din[31:0]
RDO[31:0]
32
32
There are additional
inputs/outputs
Write Machine
VHDL
State Machine
VHDL
Constant Data
VHDL counter
(RAM Address)
AO[31:0]
32
WDO[31:0]
32
VHDL - State Machine
VHDL - State Machine
Defining transfer function
Defining constants that represents the states
constant
constant
constant
constant
constant
constant
constant
constant
constant
constant
F0_STAY
F1_STAY
F2_STAY
F3_STAY
F1_UP
F2_UP
F3_UP
F0_DOWN
F1_DOWN
F2_DOWN
:
:
:
:
:
:
:
:
:
:
std_logic_vector(3
std_logic_vector(3
std_logic_vector(3
std_logic_vector(3
std_logic_vector(3
std_logic_vector(3
std_logic_vector(3
std_logic_vector(3
std_logic_vector(3
std_logic_vector(3
downto
downto
downto
downto
downto
downto
downto
downto
downto
downto
0)
0)
0)
0)
0)
0)
0)
0)
0)
0)
:=
:=
:=
:=
:=
:=
:=
:=
;=
:=
"0000";
"0001";
"0010";
"0011";
"0100";
"0101";
"0110";
"0111";
"1000";
"1001";
case state is
when F0_STAY =>
if (f_in="00") then
state <= F0_STAY;
else
state <= F1_UP;
end if;
when F0_DOWN =>
if (f_in="00") then
state <= F0_STAY;
else
state <= F1_UP;
end if;
VHDL - State Machine
"00"
"01"/"10"
F0_STAY
F1_UP
VHDL - State Machine
Defining transfer function
Defining output
case state is
...
when F3_UP =>
if (f_in="11") then
state <= F3_STAY;
else
state <= F2_DOWN;
end if;
when others => null;
end case;
f_num <="00" when ((state = F0_STAY) or (state = F0_DOWN)) else
"01" when ((state = F1_STAY) or (state = F1_DOWN) or
(state = F1_UP)) else
"10" when ((state = F2_STAY) or (state = F2_DOWN) or
(state = F2_UP)) else
"11";
move <= STAY when ((state = F0_STAY) or (state = F1_STAY)
or (state = F2_STAY) or (state = F3_STAY)) else
UP when ((state = F1_UP) or (state = F2_UP) or (state
= F3_UP)) else
DOWN;
Submission Guidelines
Waveforms – L.A Control Signals
CLK
One or
more c.c
STEP_EN
The Post lab should be submitted in two different projects:
1. Write Machine
2. Read Machine
There should be two simulations (two for each project) that show:
ü
ü
ü
ü
ü
IN_INIT
AS_N, WR_N
ACK_N
Should be
implemented
STOP
AO, DO
valid
Single c.c
wait
fetch
wait4ACK
terminate
For the next recitation
Please read Chapter 6 of the lab notes:
ü Chapter 6: A Load/Store Machine
wait
All the control signals of the state machine.
RDO[31:0], WDO[31:0]
The Machine’s state.
A full cycle of the Machine (i.e. 1st simulation).
A reset disrupted cycle (i.e. 2nd simulation).
RESA monitor program:
ü In order to prove that your design (write & read machine) is successful,
show a data snapshot before and after the writing & reading activities.
ü To complete the proof, present an appropriate L.A wave forms.
A Load/Store Machine
Advanced Computer Structure
Laboratory
Chapter 6: A Load/Store Machine
• This chapter focuses on designing memory accesses in the
DLX design.
• To focus on memory accesses, we will consider a primitive
application, called the Load/Store Machine.
• The Load/Store Machine executes DLX programs that
consist only of simplified load and store instructions.
Liron David
Load/Store Instructions
The instruction set of the Load/Store Machine consists only of
load and store instructions.
Load/Store
Semantics
lw RD R0 imm
RD:=M(imm)
sw RD R0 imm
M(imm) :=RD
Note that we allow the source register to be only R0. Recall
that the value stored in Register R0 is always zero.
Load/Store Instructions
I-Format
Load/store instructions are encoded in the I-Format.
6
5
5
16
Opcode
RS1
RD
Immediate
Encoding
Instruction
IR[31 : 26]
Load
100011
Store
101011
Memory Accesses
Memory Accesses
The Load/Store Machine accesses the main memory during
"fetch", "load", and "store" states.
How do we bridge the gap between 4 states for a memory
access (Read/Write Machine) and a single state?
The way this is done is by cascading state machines.
busy
step_en
Init
Communication between the Load/Store Machine and the
I/O Control Logic is done via a state machine called the
"Memory Access Control“ (MAC). The Memory Access Control
resembles the Read and Write Machines.
Fetch
/busy
/step_en
Decode
/reset
Halt
Load
busy
reset
/busy
Store
/busy
busy
Write Back
Memory Accesses - Waveforms
Memory Accesses
req - either mr
or mw is active
mr+mw
mr - memory read active during the "fetch"
and "load" states.
mw - memory write
- active during the
"store" state.
DLX
Datapath
busy – Read/Write
transaction is being
performed
(/ack) * req
CLK
req
ack
busy
wait4req
wait4ack
next
wait4req
The GPR environment
The GPR environment
Inputs: C, Aadr, Badr, Cadr, gpr_we
It can support one of two operations in each cycle:
Outputs: A,B
1. Write the value of input C in register R[Cadr] if gpr_we = 1.
C
C
C
0
C
Aadr
0
Badr
Aadr
Badr
Cadr
Cadr
gpr_we
32
A
gpr_we
CLK
B
32
A
CLK
B
The GPR environment
The GPR environment
2. Read the contents of the registers with indexes Aadr and
Badr. The outputs A and B are defined by:
For register debugging purposes we append third register D
with the same functionality as A,B.
RESA Monitor can read the contents of D register, addressing it
with Dadr and reading output R[Dadr] thru the Monitor slave.
C
C
Aadr
0
Badr
C
Cadr
C
0
Aadr
Badr
Cadr
32
A
B
gpr_we
CLK
Dadr
gpr_we
32
A
B
D
CLK
A schematic diagram of the GPR environment
The Control of the Load/Store Machine
The following figure depicts a state diagram of the control of
the Load/Store Machine.
d_in[31:0]
CLK
5
5
d_in[31:0]
CLK
we
we
Add[4:0] d_out[31:0]
Add[4:0]
d_out[31:0]
1
1
32
CLK
d_in[31:0]
we
Add[4:0]
busy
d_out[31:0]
Init
1
32
step_en
Fetch
32
v
/busy
/step_en
Decode
5
Halt
/reset
5
Load
busy
busy
Store
/busy
/busy
reset
32
Init
Fetch
32
Wait for step enable
Init
Fetch
Decode
Halt
Load
Write Back
Write Back
32
IR = M[PC]
Decode
Store
Halt
Load
Store
Write Back
6
IR
Opcode
5
RS1
5
RD
16
Immediate
Main Memory
0
PC
DLX
program
232-1
Init
Fetch
Decode
Halt
Init
B = RD
PC = PC+1
Load
Fetch
Decode
Halt
Store
Load
Determine the
next state
Write Back
6
IR
Opcode
5
RS1
5
RD
M(Imm) = B
Store
Write Back
16
6
C
Immediate
IR
Opcode
5
RS1
5
RD
16
Immediate
Main Memory
Main Memory
C
C
0
32
0
Aadr
Badr
Cadr
A
gpr_we
CLK
B
Init
DLX
program
PC
B
0
32
A
Init
C = M(Imm)
DLX
program
PC
B
232-1
Fetch
Decode
Halt
gpr_we
CLK
B
232-1
Fetch
0
Aadr
Badr
Cadr
RD = C
Decode
Load
Halt
Store
Load
Write Back
Store
Write Back
6
C
IR
Opcode
5
RS1
5
RD
16
6
C
Immediate
IR
Opcode
5
RS1
5
RD
16
Immediate
Main Memory
Main Memory
C
C
0
32
0
Aadr
Badr
Cadr
A
B
B
gpr_we
CLK
PC
232-1
0
DLX
program
32
0
Aadr
Badr
Cadr
A
B
B
gpr_we
CLK
PC
232-1
DLX
program
Init
Fetch
Decode
Halt
The Datapath of the Load/Store Machine
Machine is stuck till
reset
Load
The following Figure depicts a block diagram of the datapath of
the Load/Store Machine.
Store
Buses are connected to the I/O control logic, as depicted.
Write Back
6
IR
Opcode
5
RS1
5
RD
Control signals are omitted from this figure, and you are asked
to decide which signals are needed and when they are active.
16
Immediate
Main Memory
C
0
32
0
Aadr
Badr
Cadr
A
B
B
gpr_we
CLK
PC
DLX
program
232-1
v
v
v
v
v
Translating logical addresses to physical
addresses
Implementation
Please output
the state
Load/Store Machine
D_IN[31:0]
In the Load/Store Machine, the logical addresses of the
machine are limited to the maximum of 64 KWords (range: 0 to
0xFFFF), while the physical address space is 2 MWords.
The address translate unit is simple block, that concatenates
the 16 address bits of the Load/Store with 16 bit constant,
usually zero, but not more than 0x1F.
Control
Signal
Control
DLX
Control
State
Machine
reset
step_en
I/O
Control
Logic
clk
mr
Datapath
mw
MAC
busy
ack_n
AO[31:0]] D_OUT[31:0]
step_en
In_init
signals
Monitor Slave
Input A,B
AS_N, WR_N
busy
Testing a finite state machine
step_en
Init
Fetch
/busy
/step_en
Decode
The goal is to test if all the transitions of the finite state
machine are correct.
This can be done by "covering" all the transitions of the
control by paths (starting in the initial state).
/reset
Halt
busy
Load
/busy
reset
busy
/busy
Store
Write Back
For each path, one needs to compute input values that will
cause the control to traverse the path.
busy
step_en
Init
The technique of performing simulation with a given
sequence of inputs and the expected output sequence is
called test vectors.
Fetch
/busy
/step_en
Decode
/reset
Halt
busy
Load
/busy
reset
Write Back
busy
Store
/busy
busy
step_en
Init
A simulation environment
Fetch
/busy
/step_en
Decode
/reset
Halt
busy
Load
/busy
reset
busy
Store
/busy
Write Back
Check if indeed the reset signal initializes the control, and if
the step enable signal causes a transition to the "fetch" state.
A simulation environment
The IO_SIMUL Module encapsulates the simplified
functionality of I/O Control Logic, the RESA bus and the
memory.
By combining your design with the IO-SIMUL Module, you can
simulate your circuit as if it is connected to the RESA bus.
The Load/Store Machine interacts with other devices through
the RESA bus, therefor it is not a trivial task to generate
manually the signals fed to the Load/Store Machine by the
RESA bus.
To enable a simulation environment in which you do not need
to determine the values of the RESA bus signals, a module
called IO-SIMUL was designed.
Testing
ü Testing of RTL instructions (replacing I/O Logic with the I/O
SIMUL – Simulation).
ü Testing executions of whole instructions (Simulation).
ü Testing executions of whole instructions (replacing I/O
SIMUL with the I/O Logic, and implementing the design).
For the next recitation
Please read Chapter 7 of the lab notes:
ü Chapter 7: A simplified DLX
A simplified DLX
Advanced Computer Structure
Laboratory
In this recitation we describe a simplified DLX-Architecture
which you will be implementing on the RESA-2.
Chapter 7: A simplified DLX
Liron David
Instruction Formats
Instruction Formats
There are two instruction formats:
There are two instruction formats:
1. An instruction in the I-Type-Format is divided into four
fields depicted below.
2. An instruction in the R-Type-Format is divided into five
fields depicted below.
6
IR
Opcode
5
RS1
5
RD
16
Immediate
IR[31:26] holds operation’s encoding
6
IR
Opcode
IR[31:26] = 06
5
5
5
5
RS1
RS2
RD
X
6
Function
IR[5:0] holds operation’s encoding
Instruction Set
Instruction Set
I-Type
We list below the instruction set of the simplified DLX.
ü imm denotes the value of the immediate field in an I-TypeInstruction.
ü sext(imm) denotes the 2's complement sign extension of
imm to 32 bits.
ü The architectural registers of the simplified DLX are all 32
bits wide.
6
Opcode
5
RS1
5
RD
16
Immediate
I-Type
I-Type
6
Opcode
5
RS1
5
RD
16
Immediate
6
Opcode
5
RS1
5
RD
16
Immediate
Encoding of the Instruction Set
R-Type
6
Opcode
5
5
5
5
RS1
RS2
RD
X
6
Function
Encoding of the Instruction Set
Implementation
The Datapath
of the simplified DLX
Architectural Registers
The architectural registers of the
simplified DLX are all 32 bits
wide:
• 32 General Purpose Registers
(GPR): Ro to R31. Note that
R0 always holds the value 0;
• Program Counter (PC);
• Instruction Register (IR);
ALU environment
The ALU supports:
• 2's complement integer
addition and subtraction.
• Bitwise logical instructions.
• Comparison instructions.
• Special Registers: MAR, MDR,
A, B and C;
Shifter environment
The GPR environment
The shifter is a 32-bit left/right
logical shifter, means that a
zero is pushed in from the right
(left) in case of a left (right)
shift.
The GPR environment is
identical to that of the
Load/Store Machine.
The control inputs are:
shift - indicates whether a shift
should take place (otherwise
the output equals the input).
right - indicates whether the
shift is a right shift.
Control
Access to the memory is done via the Memory Access Control
module as described for the Load/Store Machine.
The reset signal causes a transition in the control of the DLX to
"init" state.
Control
step_en – from the I/O
control logic.
busy – from the memory
access control.
Control
D1. . D12 - corresponding
to the decoding of the
instructions.
else - corresponds to an
illegal instruction
Control
bt (branch taken) corresponds to the event
that the condition of a
conditional branch is
satisfied.
Control
The control signals
The control
signals are
used to
communicate
between the
Datapath and
the Control.
The active control signals in each state
Examples - lw RD RS1 imm
lw RD RS1 imm
jarl RS1
jarl RS1
sub RD RS1 RS2
sub RD RS1 RS2
beqz RD RS1 imm
beqz RD RS1 imm
Hand Out #7
Advanced Computer Structure
Laboratory
Chapter 7: A simplified DLX – Part II
• This Handout includes four sections.
• Recommended schedule and guidelines are in the
Handout.
• The weight of this handout is 400 points.
• Questions 2 and 4 requires the approval of the lab’s
engineer (the form located in the website), please attach
it to the submitted project.
• Please attach to your project the timing report produced
by Xlinix which verifies that your design meets the timing
requirements.
• The Programming assignment will be published in the
following weeks.
copyrights © Moti Medina
Building blocks
Building blocks
• The GPR
environment
• The IR environment
– Inputs: IR_CE, CLK.
– Outputs: IR_OUT[31:0], sext(imm).
– The GPR environment is
identical to that of the
Load/Store Machine.
– Please implement AEQZ.
– A CADR selection
mechanism should be
implemented.
• Sext(imm[15:0]) = imm[15]16

imm[15:0]
• The PC environment
– 32 bit Register with a RESET port.
copyrights © Moti Medina
copyrights © Moti Medina
Building blocks
Building blocks
• The MMU
• ALU environment
– Input: AO[31:0].
– Output: 08•AO[23:0]
– Inputs: A[31:0], B[31:0], ALUF[2:0], TEST, ADD.
– Outputs: ALU_OUT[31:0], NEG.
– Reminder #1:
Let A[n-1:0], B[n-1:0]
{0,1}n ,
Denote [·] the two’s comp’ representation,
So: [A[n-1:0]] - [B[n-1:0]] = [A[n-1:0]] + [¬B[n-1:0]] + 1
– Reminder #2:
B'
copyrights © Moti Medina
Building blocks
n
; B ' = XOR( B, ADD )
copyrights © Moti Medina
Building blocks
– we suggest that you use three 16-bit adder/subtractors from the Xilinx
library (ADSU16) to build a 32-bit Conditional Sum Adder.
• ALU environment – cont.
– Use the ADDSUB16 component in the following way:
– Reminder #3: “Computer Structure Lecture Notes” By Dr. Guy Even.
• Chapter 8 – Addition (8.4 Conditional Sum Adder)
• Chapter 10 – Signed Addition
– Reminder #4: “Computer Architecture - Complexity and Correctness” By
S.M.Müller and W.J.Paul .
copyrights © Moti Medina
copyrights © Moti Medina
ALU: Tasks performed in the control states
State
Operation
Decode
Alu
add
op,
op=add/sub/and/or/xor.
add
rel,
rel=lt, eq, gt, le, ge, ne.
add
add
add
add
add
AluI
TestI
ALU
Adr. Comp.
B.Taken
JR
SavePC
JALR
B[31 : 0]
ALU: Control Signals
A[31 : 0]
ovf
• add (active during states: Decode, AluI, Adr.Comp., B.Taken,SavePC, JR, JALR).
• test (active during states: TestI).
add
010
sub
110
and
101
or
100
xor
IR[2:0] = func[2:0]
ADD - SUB(32)
sub
OR(32)
XOR(32)
neg
001
gt
010
eq
011
ge
100
lt
101
ne
110
le
F [0]
S [31 : 0]
Comparator(32)
1
F [1]
0
MUX(32)
F [2 : 0]
F [2]
0
1
MUX(32)
COMP_ OUT
Next slide
ALU:
Implementation
011 ALUF [2 : 0]
031 · COMP _ OUT
32
IR[28:26] = opcode[2:0]
F [0]
0
1
MUX(32)
AND(32)
INV
32
ALUF[2:0] – test conditions
B[31 : 0] A[31 : 0]
OR
• ALUF[2:0]
011
B[31 : 0] A[31 : 0]
test
Signals that control the functionality of the ALU:
ALUF[2:0] – arithmetic / logical
ALU operations
B[31 : 0] A[31 : 0]
32
1
0
MUX(32)
ALU - OUT [31 : 0]
1
test
0
MUX(32)
F [2 : 0]
add
S [31 : 0]
“Register B”
32
The instructions in which register B is loaded:
ZERO(32)
• add
neg
Comparator
• sub
F [0]
F [2]
F [1]
• and
• or
INV
AND
AND
AND
• xor
•store
INV
AND
OR
ALU:
Implementation
OR
(cont’)
COMP_ OUT
Good Luck
Register B is not involved in computations during instructions in which it need not
be loaded. Therefore, functionality is correct.
Loading register B always (during Decode state), shortens the length of the path in
the Control State Machine when executing instructions that need register B loaded.
Advanced Computer Structure
Laboratory
DLX Recitation
Problem #1:Convert to DLX’s Assembly
xor
if(i==j)
goto L1;
f=g+h;
L1: f=f-i;
r1
r19
r20
beqz r1
1
add
r16
r17
r18
sub
r16
r16
r19
LEGEND:
r16 = f
r17 = g
r18 = h
r19 = i
r20 = j
r21 = k
Problem #2:Convert to DLX’s Assembly
LOOP: g=g + A[i] ;
addi
r4
r0
Astart
add
r1
r4
r19
lw
r2
r1
0
add
r17
r17
r2
i=i+j;
add
r19
r19
r20
if((i!=h) goto LOOP;
xor
r3
r19
r18
bnez r3
-6
Address of A[0] = Astart
‫מועד א' תשס"ד‬
‫שאלה מספר ‪4‬‬
‫ברצוננו להוסיף לשפת המכונה של ה – ‪Simplified DLX‬‬
‫פקודת ‪ I-type‬חדשה‪:‬‬
‫‪Chkbit17 RD RS1 imm‬‬
‫פקודה זו גורמת לעידכון ‪ RD‬באופן הבא‪:‬‬
‫‪if R S 1[1 7 ] = 1‬‬
‫‪o th e r w is e‬‬
‫‪31‬‬
‫‪ì 0 ×1‬‬
‫‪RD = í‬‬
‫‪32‬‬
‫‪î 0‬‬
‫הציעו מימוש של ה‪ DLX-‬שתומך בפקודה החדשה תוך ביצוע שינויים‬
‫קטנים ככל האפשר במסלול הנתונים )הניקוד יופחת עבור שינויים‬
‫מוגזמים(‪.‬‬
‫‪ .1‬מנו את השינויים הנדרשים במסלול הנתונים על מנת‬
‫לתמוך בהרצת הפקודה החדשה‪.‬‬
‫‪ .2‬הציעו הרחבה לדיאגרמת המצבים של הבקרה על מנת‬
‫לתמוך בהרצת הפקודה החדשה‪ .‬ציירו את מסלול‬
‫מצבי הבקרה‪ ,‬שמכונת המצבים חולפת דרכו בעת‬
‫הרצת הפקודה החדשה‪ .‬לכל מצב לאורך מסלול זה‬
‫)חדש וישן(‪ ,‬תארו את פקודת ה – ‪ RTL‬שמתבצעת‬
‫בו ואת אותות הבקרה הפעילים‪.‬‬
Chkbit17 RD RS1 imm
31
ì 0 ×1
RD = í
32
î 0
if R S 1[1 7 ] = 1
o th e r w is e
‫תשובה‪:‬‬
‫‪chk17‬‬
‫‪Chk17mux‬‬
‫‪1‬‬
‫‪0‬‬
‫‪ (4.1‬נוסיף )מסומן באדום(‪:‬‬
‫ ‪mux‬‬‫ ‪ -Zero Padding‬משרשר‬‫‪ 31‬אפסים משמאל‪.‬‬
‫ נמשוך את הביט ה ‪17‬‬‫מרגיסטר ‪.A‬‬
‫‪Zero‬‬
‫‪padding‬‬
‫]‪A[17‬‬
‫‪ (4.2‬נוסיף מצב נוסף – ‪.CHK17‬‬
‫מסלול ביצוע הפקודה מסומן בכחול‬
‫)‪If OPCODE = OPCODE(chkbit17‬‬
‫‪CHK17‬‬
‫‪ (4.2‬שימו לב‬
‫שרשומים אותות‬
‫הבקרה הפעילים‪,‬‬
‫כך שאין פגיעה‪/‬שינוי‬
‫במצבים אחרים‪.‬‬
‫נוסיף את מצב‬
‫‪) CHK17‬בכחול(‬
‫ונסיים‪.‬‬
‫‪Chk17,Cce‬‬
‫]‪C = 031 × A[17‬‬
‫‪CHK17‬‬
Download