CS 3220 Project ISA

advertisement
CS 3220
Project ISA
• We will be designing processor, need an ISA
• What do we want in our ISA
– Easy to decode (you’ll have to write this in Verilog)
– Easy to write assembler for (you’ll have to write one)
– Easy to write applications for (you’ll do this, too)
• Similar tradeoff involved in designing real CPUs
– Plus backward compatibility
– But for CS 3220 we don’t want backward compatibility!
– Encourages laziness and cheating
(Verilog code may already be posted somewhere)
CS 3220 Fall 2011 - Prof. Milos Prvulovic
2
• CISC or RISC?
– Definitely RISC (much easier to design)
• Fixed-size or variable size?
– Definitely fixed (fetch and decode much easier)
• How many things can be read or written
– Each register read (>1) complicates register file
– Each register write (>1) complicates register file a lot!
– Each memory read or write (>1) creates lots of
problems (memory ports, pipeline stages, hazards).
CS 3220 Fall 2011 - Prof. Milos Prvulovic
3
• How will we access memory
– Do we use only LD/ST, or do we allow
memory operands in other kinds of instructions?
• Only LD/ST is far simpler to implement because:
– Mem operands in ADD, SUB, etc. require many
“flavors” for each instruction (tough to decode)
• And we need to describe the entire decoding logic in Verilog 
– Don’t want multiple memory accesses per inst!
• Even one memory stage in the pipeline is complex enough 
• OK, we’ll have LW, SW
CS 3220 Fall 2011 - Prof. Milos Prvulovic
4
• Let’s have some arithmetic
– ADD, SUB, what else?
• How about some logic?
– Option 1: AND, OR, NOT, XOR, etc.
– Option 2: Let’s just have one! Which one? NAND!
• Can “fake” others using NAND, e.g. “NOT A” is “A NAND A”
– Let’s use Option 2 for now
• Easier to write assembler, easier to decode
• But leave room (unused opcodes) for others
• Comparisons? It depends…
– Option 1: Conditional branches do comparisons
– Option 2: Comparison instructions, one cond. branch
– Option 3: Mix of the two
CS 3220 Fall 2011 - Prof. Milos Prvulovic
5
• Conditional branches
– PC relative, need decent-sized offset operand
– Hard to write if-then-else and loops if branch
only goes e.g. 3 instructions forward or back
• How will we call procedures?
– Option 1: Special branch that saves return address
– Option 2: Save RA in SW, use normal branch
• How will we return from procedures?
– Option 1: Specialized “RET”
– Option 2: Jump-to-address-in-register (JR)
• Let’s have only one call/jump/return inst for now!
– Similar to JALR instruction from CS 2200
– Syntax would be JAL Rdst,Imm(Rsrc)
CS 3220 Fall 2011 - Prof. Milos Prvulovic
6
• Typical conditional branches
BEQ R1,R2,Label ; Go to Label if R1==R2
• Can also have BLT, BLE, BNE, BGT, BGE
• Need to encode two registers in the instruction
BEQZ R1, Label ; Go to Label if R1==0
• Can also have BNEZ, BLEZ, etc.
• Need to encode only one register in the instruction
(so we can have a 6-bit offset)
• Can have implicit operand, e.g. always R1
BEQZ Label ; If R1==0 go to Label
• But R1 won’t be very useful for anything else
CS 3220 Fall 2011 - Prof. Milos Prvulovic
7
• Need at least 2 to do ALU operations
• Plus one to be a stack pointer
• Plus one to save return address
– Unless we want to save it directly to memory
• Nice to have a few extra
– One for return value (to avoid saving it to stack)
– Some to pass parameters? Need at least 2 (more is even better)
• Need at least one for system use
– We’ll work on this in the last two projects
• OK, this is already 8 or more, so let’s have 32
– When writing code in assembler, we’ll see that more is better 
CS 3220 Fall 2011 - Prof. Milos Prvulovic
8
• Bits in instruction word? Hmm, let’s see
– Need room for opcode
• How many types of instructions do we have?
• Can have a secondary opcode for some (e.g. for ADD,SUB, etc.)
– Need room for register operands
• Do we want 1, 2, or 3 or those? 3!
• This will use 9 bits in the instruction word
– Need room for immediate operands
• The more the better, but too few will be a problem
• Let’s have 32-bit instruction word
– 8 not really an option (not enough room)
– 16 is very tight (with 16 regs, only 4 bits left for opcode)
– So let’s do 32 (allows large offsets, more regs, etc.)
CS 3220 Fall 2011 - Prof. Milos Prvulovic
9
• How about 8?
– Will need multi-word values often (e.g. loop counters)
– PC must be larger than this, procedure calls get tricky
• Can we do with 16?
– Most loops and programs will be OK
– Immediate operand can load entire constant (nice)
– Can display entire word on HEX display 
• But it makes sense to have 32-bit registers
– Same as instruction word
– Almost never have to worry about overflows and such
CS 4290/6290 – Fall 2009 – Prof. Milos Prvulovic
10
• Byte-addressed or word-addressed?
• Word-addressed is simpler
– Only need LD/ST instruction, vs. LW/SW, LB/SB, etc.
– Don’t have to worry about alignment
• But
– Hard to switch apps to byte-addressed later
– Can’t use e.g. 16-bit memory locations
– We can achieve most of the HW simplicity
if we require word-alignment
• So we’ll have byte-addressed aligned LW/SW only
– Can drop alignment limitations later if we want to
– But can add LB/SB, LH/SH later if we want to
CS 3220 Fall 2011 - Prof. Milos Prvulovic
11
• How many bits for the opcode?
• For insts w/ 3 reg operands, 15 bits already used
– Leaves 17 bits for opcode! But…
• For insts w/ 2 reg and 1 imm operand
– E.g. LW R1,-4(R2), ADDI R1,R2,64, BNE R1,R2,Label
– Imm and opcode must fit in 22 bits (10 used for regno)
• Let’s have a 16-bit immediate and 6-bit opcode
– Very few “reach” issues in branches and LW/SW
– Fairly large constants in ADDI, SUBI, ANDI, etc.
– We have 64 opcodes
• Sounds like a lot, but it’s not 
• We’ll use a trick called “secondary opcode” to save these “primary opcodes”
CS 3220 Fall 2011 - Prof. Milos Prvulovic
12
• Have a smaller primary opcode (our six bits)
– Instructions with an imm operand only have this opcode
• Can only have up to 64 such instructions
– Instructions without an imm operand have plenty of bits left over
• E.g. ADD Rd,Rs,Rt uses just 21 bits for primary opcode and register numbers
• Idea: Use these extra bits for a secondary opcode
– Uses only one primary opcode for many ALU instructions
– The extra 11 bits in these instructions => the actual operation
– We don’t need all 11, so let’s use 6 of these
• Primary opcode of 000000 means “3-reg ALU inst”
• Secondary opcode determines the actual inst
– E.g. 000000 is NOP, 000001 is ADD, etc.
• Still need to assign specific opcodes to instructions
CS 3220 Fall 2011 - Prof. Milos Prvulovic
13
• Does it matter which inst gets which opcode?
– E.g. LW is 0, ADDI is 1, 3-bit ALU is 2, etc.?
• Make the decoding easy!
– Let some opcode bits tell us what kind of inst we have
• Assigning opcode numbers as a list is messy
– Use an opcode chart
CS 3220 Fall 2011 - Prof. Milos Prvulovic
14
• We have 6-bit primary opcodes (two octal digits)
Less significant 3 bits
More significant 3 bits
0
0
1
2
3
4
5
6
7
ALUR
1
2
3
4
5
6
7
CS 3220 Fall 2011 - Prof. Milos Prvulovic
15
• When OP1 (primary opcode) is 000000,
the secondary opcode is decoded using:
0
1
2
3
AND
OR
XOR
4
5
6
7
NAND
NOR
NXOR
0
1
2
ADD
SUB
Why is AND not here?
Why is NAND not here?
3
4
EQ
LT
LE
NE
GE
GT
5
6
7
Why is EQ not HERE?
CS 3220 Fall 2011 - Prof. Milos Prvulovic
Why are GT, GE swapped?
16
• We definitely need LW/SW and BEQ/BNE
Less significant 3 bits
More significant 3 bits
0
0
1
2
3
4
5
6
7
ALUR
1
2
LW
SW
3
4
BEQ
BNE
Why are BEQ, BNE here?
5
6
7
CS 3220 Fall 2011 - Prof. Milos Prvulovic
17
• Where do we put JAL (Call/Return/Jump)?
– No immediate operand – use secondary opcode!
More significant 3 bits
0
0
1
2
3
4
5
6
7
ALUR
1
2
LW
SW
JAL
BEQ
3
4
BNE
5
6
7
CS 3220 Fall 2011 - Prof. Milos Prvulovic
18
• Uses immediate => can’t use OP2!
• I-format variant for every ALU operation?
– A 3-register ALU instruction spends one OP2 opcode
– A I-format ALU inst spends an entire OP1 opcode
• Pick carefully which ones we provide!
– Avoid redundancy – don’t need both ADDI and SUBI!
• SUBI R1,R2,N == ADDI R1,R2,-N
– Add only instructions we will need often
• Will often increment counters, pointers, and such – we want ADDI
• Will often mask values we read from I/O – we want ANDI
• Will often combine bit-fields for I/O – we want ORI
CS 3220 Fall 2011 - Prof. Milos Prvulovic
19
More significant 3 bits
• OP1 for ANDI, ORI, ADDI
0
0
1
2
ALUR
ANDI
ORI
LW
SW
JAL
BEQ
3
4
5
6
7
ADDI
1
2
3
4
BNE
5
6
7
CS 3220 Fall 2011 - Prof. Milos Prvulovic
20
• How would you put a 32-bit constant into a reg?
– Start with zero in a register (easy, use XOR)
– ADDI a 16-bit constant… OK, half-way there!
– What now?
• Errr…
• Let’s add a HI instruction!
– Gets lower 16 bits from src reg
– The upper 16 bits come from the immediate operand
– Can use ORI then HI to put 32-bit value into a reg
CS 3220 Fall 2011 - Prof. Milos Prvulovic
21
More significant 3 bits
• OP1 for HI
0
0
1
2
3
4
ALUR
ANDI
ORI
HI
ADDI
LW
SW
JAL
BEQ
5
6
7
1
2
3
4
BNE
5
6
7
CS 3220 Fall 2011 - Prof. Milos Prvulovic
22
• Which bits in the instruction word are
–
–
–
–
The primary opcode?
The secondary opcode?
The register numbers?
The offset?
• Obviously, some of these have to overlap
– ADD Rz,Rx,Ry
• Using 21 bits (6 for OP1 and 3*5 for regnos)
– ADDI Rz,Rx,Imm
• Using all 32 bits, so 16-bit Imm here overlaps with Ry above
CS 3220 Fall 2011 - Prof. Milos Prvulovic
23
• Three things to worry about
– Must be able to find the primary opcode
•
All insts must have it in the same place
– Speed
•
Things we need first should be in the same place in all insts
– Complexity of decoding
•
•
Good if same things in the same place even if they don’t have to be
Good if nothing is broken into multiple pieces (e.g. offset)
• Traditionally, primary opcode in most-significant bits
– So bits 31:26 are the primary opcode
• Traditionally, offset/immed in least-significant bits
– So bits 15:0 are the offset/immed (when it is used)
• Register numbers in-between
– Bits 25:21, bits 20:16, and (if no immediate) bits 15:11
• Secondary opcode must be somewhere in bits 10:0
– We only need six bits, so we’ll use 5:0
CS 3220 Fall 2011 - Prof. Milos Prvulovic
24
• Which register number is which?
– Sometimes we need two source registers (e.g. BEQ)
• And we have bits 25:21 and 20:16 for those
– Sometimes we need one src and one dst (e.g. ADDI)
• These must also be in 25:21 and 20:16
• Can’t always have src and dst in the same place!
– Either 25:21 or 20:16 must act as both src and dst!
CS 3220 Fall 2011 - Prof. Milos Prvulovic
25
•
We always have the first src register
– So make 25:21 always be that first src register
•
Now, 20:16 is src in BEQ but dst in ADDI
– We are left with no choice about that
•
Things are not so bad (yet)
– Can use 25:21 and 20:16 to read two registers,
then if only one needed, just don’t use the one we read using 20:16
– Can use 20:16 as write-regno, then enable the write only if it is needed
•
But for ADD, do we use 20:16 as the second src or as the dst?
– If 20:16 is second src for ADD,
the reg to write can be in 20:16 (ADDI) or in 15:11 (ADD)
but the second src (if it is needed) is always in 20:16
– If 20:16 is dst for ADD,
the reg to write is always in 20:16 (if there is a write)
but the 2nd reg to read sometimes in 20:16 (BEQ) and sometimes in 15:11 (ADD)
– Which option is better?
CS 3220 Fall 2011 - Prof. Milos Prvulovic
26
• Option 1: Second src always in 20:16
– Can read 2nd src without decoding
(throw away what we read if no 2nd src needed)
but need to decode the primary opcode to select dst
• Option 2: Destination reg always in 20:16
– 2nd src sometimes in 20:16 and sometimes in 15:11
– Need to decode OP1 to know what to read,
but register number for writing known without decoding
• Which one is better?
CS 3220 Fall 2011 - Prof. Milos Prvulovic
27
• {op1,rx,ry,rz,5’b0,op2}
– This format is used when OP1 is ALU3 or JMP
– For ALU3, function is rz = rx OP2 ry
• {op1,rx,ry,imm}
–
–
–
–
–
–
–
This format is used when op1 is ADDI/ANDI/ORI, LW/SW, BEQ/BNE
For ADDI/ANDI/ORI, the function is ry = rx op sxt(imm)
For HI, the function is ry = {(imm<<16),rx[15:0]};
For LW, ry = mem[rx + sxt(imm)]
For SW, mem[rx + sxt(imm)]=ry
For BEQ, BNE, if(rx cmp ry) PC=PC+4+(sxt(imm)*4)
JAL ry,imm(rx), the operation is ry<=PC+4; PC<=rx+(sxt(imm)*4)
(note the <= assignment!, these happen simultaneously)
– Note that ry is a src register in SW and BEQ/BNE,
but it is a dst register for LW and ADDI/ANDI/ORI!
CS 3220 Fall 2011 - Prof. Milos Prvulovic
• Instruction opcodes and register names
– Are reserved words (can’t be used as labels)
– Appear in either lowercase or uppercase
– If there is a destination register, it is listed first
• Labels
– Created using a name and then “:” at the start of a line
– Corresponds to the address where label created
• Immediate operands – number or label
– If number, hex (C format, e.g. 0xffff) or decimal (can have - sign)
– If label, just use the name of the lable (without “:”)
• For PC-relative, the immediate field is label_addr-PC-4
• For other insts, the immediate field is 16 least-significant bits of label_addr
CS 3220 Fall 2011 - Prof. Milos Prvulovic
• Each register has multiple names
•
•
•
•
•
•
•
•
•
•
•
R0 is Zero (should be always zero)
R1..R4 are also A0..A3 (function arguments, caller saved)
R5 is also RV (return value, caller saved)
R6 and R7 reserved for assembler use (we’ll see later for what)
R8..R15 are also T0..T7 (temporaries, caller saved)
R16..R23 are also S0..S7 (calee-saved temporaries)
R24..R27 reserved for system use (we’ll see later for what)
R28 is GP (global pointer)
R29 is FP (frame pointer)
R30 is SP (stack pointer)
R31 is RA (return address)
– Stack grows down, SP points to lowest in-use address
CS 3220 Fall 2011 - Prof. Milos Prvulovic
30
• .ORG <number>
– Changes “current” address to <number>
• .WORD <value>
– Places 32-bit word <value> at the current address
– <value> can be a number or a label name
– If label name, value is the full 32-bit label_addr
• .NAME <name>=<value>
– Defines a name (label) with a given value (number)
– Otherwise we would have to name constants using
.ORG 1
One:
CS 3220 Fall 2011 - Prof. Milos Prvulovic
31
• Do not actually exist in the ISA
– Translate into existing instructions
• We will have (for now)
SUBI Ri,Rj,Imm
=>
NOT Ri,Rj
=>
BR Label
BLT Ri,Rj,Label
=>
BLE,BGT,BGE
CALL Imm(Ri)
JMP Imm(Ri)
RET
Ri,Rj,-Imm
Ri,Rj,Rj
BEQ Zero,Zero,Label
R6,Ri,Rj
BNE R6,Zero,Label
=> Similar to BLT
=>
JAL
RA,Imm(Ri)
=>
JAL
R6,Imm(Ri)
=>
JAL
R6,0(RA)
CS 3220 Fall 2011 - Prof. Milos Prvulovic
ADDI
NOR
=>
LT
32
• Separate inst and data memory?
– Good: Our design will be faster, cheaper
– Bad: How does one load programs into memory?
• We’ll have separate imem and dmem for now
– We’ll see later how to unify them
• How much memory?
–
–
–
–
There are 239,616 memory bits on-chip, so
8kB (2048 32-bit words) of imem
8kB (2048 32-bit words) of dmem
Leaves about half of memory bits on the FPGA chip
(for register file, debugging in SignalTap, etc.)
CS 3220 Fall 2011 - Prof. Milos Prvulovic
33
• We want our programs to
– Read SW, KEY (so we can interact with it)
– Write to HEX, LEDG, LEDG
– Maybe some more I/O
• Need instructions for this!
– Special instruction for each device, e.g. “WRLEDG”
• Extensions are hard (change processor as each device added)
– Special IN/OUT instructions
• Assign “addresses” to devices, then use IN/OUT to read/write
– Memory-mapped I/O (this is what we’ll use)
• Each device gets a memory address, LW/SW can be used for I/O
• Can’t use those memory locations as normal memory!
CS 3220 Fall 2011 - Prof. Milos Prvulovic
34
• Write an assembler
– Reads assembler listing for this project ISA
Don’t panic (yet)!
Will do most of the
design in lectures!
• Including pseudo instructions
– Outputs a file with 2048 32-bit words of memory
in the .mif file format (Test2.mif, Sorter2.mif)
• Verilog design of a multi-cycle processor
– Implements this ISA, PC starts at (byte address) 0x20
– Uses Sorter2.mif to pre-load its 8kB memory
– LW from address 0xFFFFFF00 reads KEY state
• Result of LW should be 0 when no KEY pressed, 0xF when all are pressed
• This means we actually need LW to get {28’b0,!KEY}
–
–
–
–
LW from address 0xFFFFFF10 reads SW state
SW to address 0xFFFFFF80 displays value on HEX display
SW to address 0xFFFFFF90 writes to LEDR
SW to address 0xFFFFFFA0 writes to LEDG
CS 3220 Fall 2011 - Prof. Milos Prvulovic
35
Download