Announcement No class next week Final project presentation on and

advertisement
Announcement
 No class next week
 Classes on Tuesday 4/12 and Thursday 4/14 are canceled
 Final project presentation on Tuesday 4/19 and
Thursday 4/21
 Tuesday: Team 5-7, Thursday: Team 1-4
 Send me your presentation slides before the Tuesday’s
class
 Final project report is due before final exam
 Final review on April 28
 Final exam is scheduled during the exam week
ECE 351 Digital Systems Design
1
Recap from the last class
 Origin of timing issues
 Unpredictable delay in combinatorial logic
 Setup/hold time in D-FlipFlop
 Timing issues
 Combinatorial Delay in Synchronous Design
• Static timing analysis
 Asynchronous Inputs
• Timing hazards and solutions
 Metastability
• Quantification
 Timing simulation
ECE 351 Digital Systems Design
2
ECE 351
Digital Systems Design
Von Neumann Computer Architecture
Wei Gao
Spring 2016
3
von Neumann Stored Program Computer
 "Stored Program" means the HW is designed to
execute a set of pre-defined instructions
 The program and data reside in a storage unit (i.e.,
memory)
 To change the computer functionality, the program is
changed instead of the HW
 This implies sequential execution
 The drawback is the "von Neumann bottleneck" in getting
data into and out of memory
ECE 351 Digital Systems Design
4
Block Diagram of von Neumann Computer
 CPU + Memory
 Clock synchronized via system bus
ECE 351 Digital Systems Design
5
Block Diagram of the Central Processing
Unit (CPU)
ECE 351 Digital Systems Design
6
Central Processing Unit (CPU)
 CPU components
 Control unit
• The state machine that directs the execution of instructions
• For a given operation, the state machine traverses a specific
path within its state diagram
 Processing unit
• Contains all of the registers and ALU that hold and manipulate
data
• Memory signals (data/address) coming into/out-of this unit
 Control signals
• Signals sent to processing unit from the control unit
• Load data into registers
• Select ALU operation
 Test signals
ECE 351 Digital Systems Design
7
Processing Unit
 Registers within the processing unit
 Instruction Registers (IR)
• Holds the Opcode that is read from memory
• Passes the Opcode to the Control Unit as a test signal
 Memory Address Register (MAR)
• Holds the current address being sent to memory
 Program Counter (PC)
• Tracks the address of which instruction is being executed
• MAR tracks PC when executing instruction
 ALU Operand Register (Z)
• Holds one of the inputs to the ALU
• The other input comes from one of the user-controlled registers
ECE 351 Digital Systems Design
8
Arithmetic / Logic Unit (ALU)
 Performs data math and manipulation
 ALU operations
 We first load Z with the first input
 We then select which user-controlled register is the other input
 The control unit sends select lines to indicate which operation to
perform
 Condition Code Register (CCR)
 Tracks the status of ALU operations (i.e., NZVC)
 These signals are sent to the control unit in order to alter
sequence flow
ECE 351 Digital Systems Design
9
Buses
 We route data in the processing unit between
registers/memory using shared lines called buses
 Two buses are needed
 Bus 1:
• Can take either PC or the User-Controlled Registers
• Will drive to Memory_In or Bus 2
 Bus 2:
• Can take either ALU, Bus1, or Memory_Out
• Will drive to IR, MAR, PC, User-Controlled Registers, or ALU
Operand Reg
 Bus select lines come from the Control Unit to select
which information is on which bus at any given time.
ECE 351 Digital Systems Design
10
Register Loads
 Each register in the processing unit can be loaded by
the control unit
 The input to most registers is Bus2
 The CCR input is the ALU
 The loads are synchronous to clock and occur on the
following state
 We can make a register
as follows:
MAR_Register : process (Clock, Reset)
begin
if (Reset = '0') then
MAR <= "0000";
elsif (Clock'event and Clock='1') then
if (MAR_Load = '1') then
MAR <= Bus2;
end if;
end if;
end process;
ECE 351 Digital Systems Design
11
Control Unit Sub-Operations
 1. Fetch
 Get next instruction
into IR
 PC: program counter,
always points to next
instruction
 IR: holds the fetched
instruction
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
4
Y
PC 100
IR
load X,
MAR
X
I/O
100
load X,
101
123
102
ADD X, Y
ECE 351 Digital Systems Design
12
Control Unit Sub-Operations
 2. Decode
 Determine what
the instruction
means
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
4
Y
PC 100
IR
load X,
MAR
X
I/O
100
load X,
101
123
102
ADD X, Y
ECE 351 Digital Systems Design
13
Control Unit Sub-Operations
 3. Fetch operands
 Move data from
memory to
data-path register
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
4
Y
101
PC 100
IR
load X,
MAR
123
X
I/O
100
load X,
101
123
102
ADD X, Y
ECE 351 Digital Systems Design
14
Control Unit Sub-Operations
 4. Execute
 Move data
through the ALU
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
4
Y
101
PC 100
IR
load X,
MAR
123
X
I/O
100
load X,
101
123
102
ADD X, Y
ECE 351 Digital Systems Design
15
Control Unit Sub-Operations
 5. Store results
 Write data from
register to memory
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
4
Y
101
PC 100
IR
load X,
MAR
123
X
I/O
100
load X,
101
123
102
ADD X, Y
ECE 351 Digital Systems Design
16
Instruction Execution
 State 0
 Put the current Program Counter value on the Memory Address Bus to
read the first Opcode

 State 1
RTL:
MAR <= PC
Control: Bus1_Sel = PC
Bus2_Sel = Bus1
MAR_Load
 Bring in the contents of memory (the Opcode) and put into IR
Increment PC to point at either the Operand or next Opcode in
memory
RTL:
IR <= Memory_Out
PC = PC + 1
Control: Bus2_Sel = Memory_Out
IR_Load
PC_Inc
ECE 351 Digital Systems Design
17
Instruction Execution
 State 2
 The Control Unit now decodes IR
 This dictates the next state and which control signals are asserted
(IR = ADD_XY)
RTL:
Z <= X
 State 3
Control: Bus1_Sel = X
Bus2_Sel = Bus1
Z_Load
RTL:
ALU = ADD
Control: Bus1_Sel = Y
Bus2_Sel = ALU
ALU_Sel = ADD
X_Load
CCR_Load
ECE 351 Digital Systems Design
18
Von Neumann Bottleneck
 We have seen that the von Neumann computer is
serial in its execution of instructions
 This is good for simplicity, but can limit performance
 There are many techniques to improve the
performance of this computer
 Functional Timing
 Memory Architecture
 Algorithmic Branch Prediction
 Pipelines
ECE 351 Digital Systems Design
19
1. Functional Timing
 A delay (or phase) can be added to the clock that the
B-register sees. This creates a single-shot structure
which executes in 1 cycle
 This allows multiple register transfers in one clock cycle
A
D
CLK
CLKA
CLKB
(from controller)
AQ
BQ
B
Q
D
Q
tphase
tphase
LOAD
tCQ
A(0)
A(1)
A(0)
tCQ
ECE 351 Digital Systems Design
20
2. Memory Architecture
 The main bottleneck is trying to get Opcodes,
Operands, and Data out of memory
 Memory systems run slower than CPU’s so access
needs to be slowed to accommodate the memory
technology (i.e., DRAM = Capacitors)
 Cache memory is a technique to improve the speed
of memory access
 Cache is smaller, faster, SRAM that is placed on the same
chip as the CPU.
 Cache improves performance
 Latency = the timing overhead associated with accessing
memory
ECE 351 Digital Systems Design
21
3. Algorithmic Branch Prediction
 Algorithms can be developed to “predict” a potential
branch
 This would allow the memory controller to load up a
portion of Cache with the code that could be
potentially executed if the branch was taken
ECE 351 Digital Systems Design
22
4. Pipelining
Wash
1
2
3
4
5
6
7
8
1
2
Non-pipelined
Dry
2
3
4
5
6
7
non-pipelined dish cleaning
Decode
1
4
5
6
7
8
4
5
6
7
Pipelined
1
Fetch-instr.
3
8
1
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
Execute
Instruction 1
Store res.
pipelined instruction execution
ECE 351 Digital Systems Design
3
pipelined dish cleaning
Time
2
Fetch ops.
2
8
Time
Pipelined
8
Time
23
Summary
 Von Neumann architecture
 CPU + memory
 Central Processing Unit (CPU)




Control unit
Processing unit
Control/testing signals
Control unit operations
 Instruction execution
 Von Neumann improvements




Functional timing
Memory architecture – using cache
Algorithmic branch prediction
Pipelining
ECE 351 Digital Systems Design
24
Download