TDC 311 Microarchitecture Dr. C.M.White HIDDEN TEXT All CPUs

advertisement
TDC 311 Microarchitecture
Dr. C.M.White
All CPUs execute the machine code for which the CPU is designed. IBM mainframes execute
IBM 370 machine code; Motorola 680x0 microprocessors execute 680x0 machine code; etc. The
CPU will fetch a machine code instruction, decode the opcode and operands, and execute the
instruction. CPUs that are not microprogrammed perform the above fetch, decode and execute in
hardware (digital logic). CPUs that are microprogrammed fetch, decode and execute the
machine code instructions by executing microcode. Each machine code instruction causes a
different section of microcode to execute. The execution of the appropriate microcode sends out
the control signals to open and close gates, registers, ALU, etc.
For example:
the COBOL statement:
Add 1 to Total.
might be compiled into the following machine language statements:
Load
Add
Store
Reg1,Total
Reg1,+1
Reg1,Total
Then, the machine language statement Load Reg1,Total has to be executed by the
microarchitecture.
Example Microarchitecture Data Path
Examine Figure 4-1 Page 205 closely. Note how C bus flows into the registers. Note how the
registers flow into B bus and the H register flows into the A bus. There is a switch (the little
arrows) at each point between bus and register or register and bus.
TDC 311 Microarchitecture
1
The following are the internal 32-bit registers:
MAR - memory address register
MDR - memory data register
MBR - memory buffer register
PC - program counter (address of NEXT instruction)
(The following registers might be discussed later: SP, LV, CPP, TOS, OPC, H)
ALU - arithmetic logic unit - can add, AND, and complement.
Function ALU performs is based on 6 inputs: See Figure 4-2 page 206
TDC 311 Microarchitecture
2
(Typo in Figure 4-2: Second occurrences of A and B should be ~A and ~B)
Shifter - can shift left logical one byte, right arithmetic one bit, or no shift.
All operations require precise timing of the signals on the buses and values entering and leaving
the registers. See Figure 4-3 page 208.
TDC 311 Microarchitecture
3
Reading and writing main memory can be performed two ways:
1. 32-bit words: MAR has the 32-bit address of a word in storage, data is read / written into /
from the MDR. Thus, if the address in MAR is 4, it really means access word number 4.
2. 8-bit byte: the PC has a 32-bit address of a byte in storage, and the byte at that address is read
into the low 8 bits of the MBR register. If the address in PC is 4, it means access byte number 4.
When you load the MBR with the byte, you can do a logical load (leading 24 bits are 0), or an
arithmetic load (leading 24 bits receive the proper sign bit). (Sign extension)
The Microinstructions
The control unit needs to send out a total of 29 signals:
9 for registers to B bus
9 for C bus to registers
6 for ALU control
2 for shifter control
2 for Read/Write via MAR/MDR
1 to indicate memory fetch via PC/MBR.
We can create a microinstruction (Figure 4-5 page 212) that has the following fields:
The complete block diagram for the example microarchitecture is shown in Figure 4-6 Page 214.
TDC 311 Microarchitecture
4
Note how bits from the MIR leave the MIR as on or off signals (1 or 0) which control the
operation of gates, shifting, adding, register selection, and address creation.
Control store - contains 512 36-bit microinstructions. Note: When executing a microprogram,
you do not simply execute the next instruction. Each microinstruction tells you which
microinstruction to execute next.
MIR: the instruction register for the control store
MPC: the program counter for the control store
During subcycle 1, MIR is loaded from the address currently held in MPC.
TDC 311 Microarchitecture
5
During subcycle 2, the signals from MIR propagate out and the B bus is loaded from the selected
register.
During subcycle 3, the ALU and shifter operate and produce a stable result.
During subcycle 4, the C bus, memory buses, and ALU values become stable. The MBR and
MDR get their results from the memory operation started at the end of the previous data path
cycle. The MPC is loaded in preparation for the next microinstruction.
A Simple Example
Increment the Program Counter by 1 (PC = PC + 1). What are the events that will cause this to
happen?
1. Gate PC onto B bus.
2. Perform B+1 function in ALU
3. Gate C bus into PC
Try another one:
MAR = MBRU + H; rd
SP = MBR = SP + 1
MDR = SP + H
Design of the Microarchitecture Level
1. Speed vs. cost
Reduce the number of clock cycles needed to execute an instruction
Simplify the organization so that the clock cycle can be shorter
Overlap the execution of instructions
2. Reduce the execution path length (the number of clock cycles needed to execute a set of
TDC 311 Microarchitecture
6
operations)
Say a machine code instruction requires 5 microinstructions. Is there anyway you can cut that
down to 4 microinstructions by performing 2 operations at the same time?
3. Add another internal bus (See Figure 4-29 Page 252)
If you extend the A bus such that all registers can lead into either the A bus or the B bus, you can
simplify some operations.
4. Create an independent unit that fetches and processes the instructions.
5. Prefetch the instruction
6. Perform pipelining (See Figure 4-34 page 259)
TDC 311 Microarchitecture
7
Unfortunately, pipelining is ruined when the program does a branch.
Improving Performance
1. Cache memory
Main memory is usually referenced near one location (locality principle). Program obviously in
one location, and data often in another location. Bring most recently referenced values into high
speed cache.
How does the CPU know something is in the cache or not?
Direct-mapped cache
Consider a cache which has 2048 entries, each entry holding 32 bytes (not bits!) of data. 2048
entries times 32 bytes per entry equals 64 KB. The Valid bit tells whether there is valid data in
the cache line.
TDC 311 Microarchitecture
8
Addresses that use this entry:
:
:
V
Tag (16 bits)
Data (32 bytes)
When a program generates a 32 bit address, it has the following form:
Tag - 16 bits
Line - 11 bits
Word - 3 bits
Byte - 2 bits
To see if the data item is in cache, take the 11-bit LINE portion, which points to one of the 2048
cache entries. The 16-bit TAG of the address is compared to the 16-bit Tag value in the cache
entry. If there is a match, the data is there. The 3-bit WORD portion of the address tells you
which word from the 8 words (32 bytes) in the cache line should be fetched. The 2-bit BYTE
address may tell you which one of the four bytes to fetch.
Note: Since the cache holds 64 KB, it holds data for addresses 0 - 65535. But it may also hold
data for the addresses 65536 - 131072, and so on. That is why you must compare the TAG fields
to see if there is a match.
If no match, then there is a cache miss and the CPU must go to main memory and fetch the data,
then store it in the cache entry, thus wiping out the old value.
For example, CPU wants to fetch data at loc 3610 (0000002416):
0000 0000 0000 0000
tag
TDC 311 Microarchitecture
0000 0000 001
line
9
001
00
word byte
2. Branch prediction and speculative execution
Processor tries to predict which way a branch statement might go and then loads the machine
instructions based on that prediction.
Dynamic branch prediction - Create a history table which lists the branches that have been taken
and whether they branched back or not. Overhead!
Static branch prediction - You know a loop that loops 1000 times will branch back 999, so go
ahead and load the instructions as if the loop was going to be taken. Simpler, but fails 1 out of
1000 times.
3. Out-of-order execution
Sometimes you can rearrange the order of instructions and not make any difference in the final
program outcome. For example, by moving the write operation up one statement, you can start it
sooner (because I/O operations always take longer than other instructions).
Add two register contents and store in a register
Increment a counter by 1
Start a write operation
changed to:
Add two register contents and store in a register
Start a write operation
Increment a counter by 1
4. Register renaming
Keep track of when a variable is “alive”. When it is no longer alive, reuse the register it was in.
Counter: integer;
Counter := 0;
Read a value;
Counter := Counter + 1;
Sum := Sum + value;
Loop back
Print Counter;
NewValue := Value;
Read a name;
:
TDC 311 Microarchitecture
Put the Counter value into some register
Counter not used after this point, so reuse this register
10
Download