CS 152 Computer Architecture and Engineering Lecture 2 - Simple Machine Implementations

advertisement

CS 152 Computer Architecture and

Engineering

Lecture 2 - Simple Machine

Implementations

Krste Asanovic

Electrical Engineering and Computer Sciences

University of California at Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs152

Last Time in Lecture 1

• Computer Science at crossroads from sequential to parallel computing

• Computer Architecture >> ISAs and RTL

– CS152 is about interaction of hardware and software, and design of appropriate abstraction layers

• Comp. Arch. shaped by technology and applications

– History provides lessons for the future

• Cost of software development a large constraint on architecture

– Compatibility a key solution to software cost

• IBM 360 introduces notion of “family of machines” running same ISA but very different implementations

– Within same generation of machines

– “Future-proofing” for subsequent generations of machine

CS152Spring’09

1/27/2009 2

Burrough’s B5000 Stack Architecture:

An ALGOL Machine, Robert Barton, 1960

• Machine implementation can be completely hidden if the programmer is provided only a high-level language interface.

• Stack machine organization because stacks are convenient for:

1. expression evaluation;

2. subroutine calls, recursion, nested interrupts;

3. accessing variables in block-structured languages.

• B6700, a later model, had many more innovative features

– tagged data

– virtual memory

– multiple processors and memories

CS152Spring’09

1/27/2009 3

A Stack Machine

Processor stack

:

Main

Store

A Stack machine has a stack as a part of the processor state typical operations: push, pop, +, *, ...

Instructions like + implicitly specify the top 2 elements of the stack as operands.

1/27/2009 a push b

 b a push c

 c b a

CS152Spring’09 pop

 b a

4

Evaluation of Expressions

(a + b * c) / (a + d * c - e)

/ a

+ b

* c

+

e a * d c

Reverse Polish a b c * + a d c * + e - /

*

1/27/2009 CS152Spring’09 c a

Evaluation Stack

5

Evaluation of Expressions

(a + b * c) / (a + d * c - e)

/ a

+ b

* c

+

e a * d c

Reverse Polish a b c * + a d c * + e - / add

1/27/2009 CS152Spring’09

+ b * c

Evaluation Stack

6

Hardware organization of the stack

• Stack is part of the processor state

 stack must be bounded and small

 number of Registers, not the size of main memory

• Conceptually stack is unbounded

 a part of the stack is included in the processor state; the rest is kept in the main memory

1/27/2009 CS152Spring’09 7

Stack Operations and

Implicit Memory References

• Suppose the top 2 elements of the stack are kept in registers and the rest is kept in the memory.

Each push operation

 pop operation

1 memory reference

1 memory reference

No Good!

• Better performance can be got if the top N elements are kept in registers and memory references are made only when register stack overflows or underflows.

Issue - when to Load/Unload registers ?

CS152Spring’09

1/27/2009 8

Stack Size and Memory References

program push a push b push c

*

+ push a push d

-

/ push c

*

+ push e

1/27/2009 a b c * + a d c * + e - / stack (size = 2)

R0

R0 R1

R0 R1 R2

R0 R1

R0

R0 R1

R0 R1 R2

R0 R1 R2 R3

R0 R1 R2

R0 R1

R0 R1 R2

R0 R1

R0 memory refs a b c, ss(a) sf(a) a d, ss(a+b*c) c, ss(a) sf(a) sf(a+b*c) e,ss(a+b*c) sf(a+b*c)

4 stores, 4 fetches (implicit)

CS152Spring’09 9

Stack Size and Expression Evaluation

a and c are

“loaded” twice

 not the best use of registers!

a b c * + a d c * + e - / program push a push b push c

*

+ push a push d push c

*

+ push e

-

/ stack (size = 4)

R0

R0 R1

R0 R1 R2

R0 R1

R0

R0 R1

R0 R1 R2

R0 R1 R2 R3

R0 R1 R2

R0 R1

R0 R1 R2

R0 R1

R0

1/27/2009 CS152Spring’09 10

Register Usage in a GPR Machine

(a + b * c) / (a + d * c - e)

Reuse

R2

Reuse

R3

Reuse

R0

Load R0

Load R1

Load R2

Mul R2

Add R2

Load R3

Mul R3

Add R3

Load R0

Sub R3

Div R2

R0 d

R1

R0 e

R0

R3 a c b

R1

More control over register usage since registers can be named explicitly

Load

Load

Load

Ri m

Ri (Rj)

Ri (Rj) (Rk)

eliminates unnecessary

Loads and Stores

- fewer Registers but instructions may be longer!

CS152Spring’09

1/27/2009 11

Stack Machines: Essential features

• In addition to push, pop, + etc., the instruction set must provide the capability to

– refer to any element in the data area

– jump to any instruction in the code area

– move any element in the stack frame to the top machinery to carry out

+, -, etc.

SP

DP

PC

 push a push b push c

*

+ push e

/ code

CS152Spring’09

1/27/2009 stack a b c

.

.

.

data

12

Stack versus GPR Organization

Amdahl, Blaauw and Brooks, 1964

1. The performance advantage of push down stack organization is derived from the presence of fast registers and not the way they are used.

2.“Surfacing” of data in stack which are “profitable” is approximately 50% because of constants and common subexpressions.

3. Advantage of instruction density because of implicit addresses is equaled if short addresses to specify registers are allowed.

4. Management of finite depth stack causes complexity.

5. Recursive subroutine advantage can be realized only with the help of an independent stack for addressing.

6. Fitting variable-length fields into fixed-width word is awkward.

CS152Spring’09

1/27/2009 13

Stack Machines

(Mostly)

Died by 1980

1. Stack programs are not smaller if short (Register) addresses are permitted.

2. Modern compilers can manage fast register space better than the stack discipline.

GPR’s and caches are better than stack and displays

Early language-directed architectures often did not take into account the role of compilers!

B5000, B6700, HP 3000, ICL 2900, Symbolics 3600

Some would claim that an echo of this mistake is visible in the SPARC architecture register windows more later…

1/27/2009 CS152Spring’09 14

Stacks post-1980

• Inmos Transputers (1985-2000)

– Designed to support many parallel processes in Occam language

– Fixed-height stack design simplified implementation

– Stack trashed on context swap (fast context switches)

– Inmos T800 was world’s fastest microprocessor in late 80’s

• Forth machines

– Direct support for Forth execution in small embedded real-time environments

– Several manufacturers (Rockwell, Patriot Scientific)

• Java Virtual Machine

– Designed for software emulation, not direct hardware execution

– Sun PicoJava implementation + others

• Intel x87 floating-point unit

– Severely broken stack model for FP arithmetic

– Deprecated in Pentium-4, replaced with SSE2 FP registers

CS152Spring’09

1/27/2009 15

Microprogramming

• A brief look at microprogrammed machines

– To show how to build very small processors with complex ISAs

– To help you understand where CISC machines came from

– Because it is still used in the most common machines (x86,

PowerPC, IBM360)

– As a gentle introduction into machine structures

– To help understand how technology drove the move to RISC

1/27/2009 CS152Spring’09 16

ISA to Microarchitecture Mapping

• ISA often designed with particular microarchitectural style in mind, e.g.,

– CISC  microcoded

– RISC  hardwired, pipelined

– VLIW  fixed-latency in-order pipelines

– JVM  software interpretation

• But can be implemented with any microarchitectural style

– Core 2 Duo: hardwired pipelined CISC (x86) machine (with some microcode support)

– This lecture: a microcoded RISC (MIPS) machine

– Intel could implement a dynamically scheduled outof-order VLIW (IA-64) processor

– ARM Jazelle: A hardware JVM processor

– Simics: Software-interpreted SPARC RISC machine

CS152Spring’09

1/27/2009 17

Microarchitecture:

Implementation of an ISA

Controller status lines control points

Data path

Structure: How components are connected.

Static

Behavior: How data moves between components

Dynamic

1/27/2009 CS152Spring’09 18

Microcontrol Unit

Maurice Wilkes, 1954

Embed the control logic state table in a memory array op conditional code flip-flop

1/27/2009

Next state

 address

Matrix A

Decoder

Matrix B

Control lines to

ALU, MUXs, Registers

CS152Spring’09 19

Microcoded Microarchitecture

busy?

zero?

opcode

 controller

(ROM) holds fixed

microcode instructions holds user program written in macrocode instructions (e.g.,

MIPS, x86, etc.)

1/27/2009

Datapath

Data Addr

Memory

(RAM) enMem

MemWrt

CS152Spring’09 20

The MIPS32 ISA

• Processor State

32 32-bit GPRs, R0 always contains a 0

16 double-precision/32 single-precision FPRs

FP status register, used for FP compares & exceptions

PC, the program counter some other special registers

See H&P

Appendix B for full description • Data types

8-bit byte, 16-bit half word

32-bit word for integers

32-bit word for single precision floating point

64-bit word for double precision floating point

• Load/Store style instruction set data addressing modes- immediate & indexed branch addressing modes- PC relative & register indirect

Byte addressable memory- big-endian mode

1/27/2009

All instructions are 32 bits

CS152Spring’09 21

MIPS Instruction Formats

ALU

ALUi

6

0

5 rs rt rd 0 func rd  (rs) func (rt) opcode rs rt

5 5 5 6 immediate rt  (rs) op immediate

Mem

6 opcode rs

5 5 16 rt displacement M[(rs) + displacement]

6 5 5 16 opcode rs offset

6 5 opcode rs

5 16

6 26 opcode offset

BEQZ, BNEZ

JR, JALR

J, JAL

CS152Spring’09

1/27/2009 22

A Bus-based Datapath for MIPS

Opcode ldIR OpSel ldA zero?

ldB 32(PC)

2

IR

ExtSel

2 enImm

Imm

Ext rd rt

A

ALU control enALU

ALU

B addr

32 GPRs

+ PC ...

3

RegSel

32-bit Reg data

RegWrt enReg busy ldMA

MA addr

Memory

MemWrt data enMem

Bus 32

Microinstruction: register to register transfer (17 control signals)

MA  PC means RegSel = PC; enReg=yes; ldMA= yes

B  Reg[rt] means RegSel = rt; enReg=yes; ldB = yes

1/27/2009 CS152Spring’09 23

Memory Module

addr busy

RAM we din dout

Write(1)/Read(0)

Enable bus

Assumption: Memory operates independently and is slow as compared to Reg-to-Reg transfers

(multiple CPU clock cycles per access)

1/27/2009 CS152Spring’09 24

Instruction Execution

Execution of a MIPS instruction involves

1. instruction fetch

2. decode and register fetch

3. ALU operation

4. memory operation (optional)

5. write back to register file (optional)

+ the computation of the

next instruction address

CS152Spring’09

1/27/2009 25

Microprogram Fragments

instr fetch: MA

PC

A

PC

IR

Memory

PC

A + 4 dispatch on OPcode

ALU:

ALUi: can be treated as a macro

A

Reg[rs]

B

Reg[rt]

Reg[rd]

 func(A,B) do instruction fetch

A

Reg[rs]

B

Imm

Reg[rt]

Opcode(A,B) do instruction fetch sign extension ...

CS152Spring’09

1/27/2009 26

Microprogram Fragments

(cont.)

LW: A  Reg[rs]

B  Imm

MA  A + B

Reg[rt]  Memory

do instruction fetch

A  PC

B  IR

PC  JumpTarg(A,B)

do instruction fetch

JumpTarg(A,B) =

{A[31:28],B[25:0],00}

A  Reg[rs]

If zero?(A) then go to bz-taken

do instruction fetch

A  PC

B  Imm << 2

PC  A + B

do instruction fetch

CS152Spring’09 27 1/27/2009

J: beqz: bz-taken:

MIPS Microcontroller:

first attempt

Opcode zero?

Busy (memory) latching the inputs may cause a one-cycle delay

ROM size ?

= 2 (opcode+status+s) words

Word size ?

= control+s bits

6

 PC (state) s addr

 Program ROM data

Control Signals (17)

CS152Spring’09

1/27/2009 s

How big is “s”?

next state

28

Microprogram in the ROM

worksheet

State Op zero? busy Control points fetch

0 fetch

1 fetch

1 fetch

2 fetch fetch

3

3

*

*

*

*

*

*

*

*

*

*

ALU *

* yes no IR  Memory

*

*

MA  PC

....

A 

PC 

PC

A + 4

* PC  A + 4 next-state fetch

1 fetch

1 fetch

2 fetch

3

?

ALU

0

ALU

0

ALU

1

ALU

2

*

*

*

1/27/2009

*

*

*

*

*

*

A  Reg[rs]

B  Reg[rt]

ALU

ALU

1

2

Reg[rd]  func(A,B) fetch

0

CS152Spring’09 29

Microprogram in the ROM

State Op zero? busy Control points fetch

0 fetch

1 fetch

1 fetch

2 fetch

3 fetch

3 fetch

3 fetch

3 fetch

3 fetch

3 fetch

3 fetch

3 fetch

3

...

ALU

ALU

0

ALU

1

2

1/27/2009

ALU

ALUi

LW

SW

J

*

*

*

*

JAL

JR beqz *

*

*

*

*

*

*

*

*

*

*

*

*

*

*

JALR *

*

*

*

*

*

*

*

*

*

*

*

*

* yes

MA  PC

....

no IR  Memory

* A  PC

PC  A + 4

PC  A + 4

PC  A + 4

PC  A + 4

PC  A + 4

PC  A + 4

PC  A + 4

PC  A + 4

PC  A + 4

*

*

*

A

B

Reg[rs]

Reg[rt]

ALU

ALU

1

2

Reg[rd]  func(A,B) fetch

0

CS152Spring’09 next-state fetch

1 fetch

1 fetch

2 fetch

3

ALU

0

ALUi

0

LW

0

SW

0

J

0

JAL

0

JR

0

JALR

0 beqz

0

30

Microprogram in the ROM

Cont.

State Op zero? busy Control points

ALUi

0

ALUi

1

ALUi

1

ALUi

2

...

J

J

0

J

1

2

...

beqz

0 beqz

1 beqz

1 beqz

2 beqz

3

...

* sExt uExt

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

* yes no

*

*

*

*

*

*

*

*

*

*

*

*

*

*

A

B

Reg[rs] sExt

A  PC

B  IR

PC  JumpTarg(A,B) fetch

0

A  PC

....

B  sExt

16

16

PC  A+B

(Imm) next-state

ALUi

1

ALUi

2

B  uExt

16

(Imm) ALUi

2

Reg[rd]  Op(A,B) fetch

0

A  Reg[rs]

(Imm)

J

J

1

2 beqz beqz fetch beqz fetch

1

2

0

3

0

1/27/2009

JumpTarg(A,B) = {A[31:28],B[25:0],00}

CS152Spring’09 31

Size of Control Store

status & opcode

/ w 

PC size = 2 (w+s) x (c + s) addr

Control ROM

Control signals / c data

/ s next

PC

MIPS: w = 6+2 c = 17 s = ?

no. of steps per opcode = 4 to 6 + fetch-sequence no. of states

(4 steps per op-group ) x op-groups

+ common sequences

= 4 x 8 + 10 states = 42 states

 s = 6

Control ROM = 2 (8+6) x 23 bits

48 Kbytes

1/27/2009 CS152Spring’09 32

Reducing Control Store Size

Control store has to be fast  expensive

• Reduce the ROM height (= address bits)

reduce inputs by extra external logic each input bit doubles the size of the control store

reduce states by grouping opcodes find common sequences of actions

condense input status bits combine all exceptions into one, i.e., exception/no-exception

• Reduce the ROM width

restrict the next-state encoding

Next, Dispatch on opcode, Wait for memory, ...

encode control signals (vertical microcode)

1/27/2009 CS152Spring’09 33

CS152 Administrivia

• Room change/decision:

– Either Soda 306 (here) for all but two lectures, back in 320 for those

– Or 182 Dwinelle for all lectures

– Vote ?

• Class schedule almost up to date (but might need to change quiz depending on vote above)

• Lab 1 coming out on Thursday, together with PS1

CS152Spring’09

1/27/2009 34

Problem Sets

• Designed to help you learn material and practice for quiz

• Must hand in day before section that reviews problem set, which will be shortly before quiz

• Solutions handed out in review section

• Quiz assumes you have worked through problem set.

• Problem sets graded on 0,1,2 scale

1/27/2009 CS152Spring’09 35

Collaboration Policy

• Can collaborate to understand problem sets, but must turn in own solution. Some problems repeated from earlier years - do not copy solutions. Quiz problems will not be repeated…

• Each student must complete directed portion of the lab by themselves. OK to collaborate to understand how to run labs

– Class news group info on web site.

• Can work in group of up to 3 students for openended portion of each lab

– OK to be in different group for each lab -just make sure to label participants’ names clearly on each turned in lab section

CS152Spring’09

1/27/2009 36

MIPS Controller V2

Opcode ext op-group absolute input encoding reduces ROM height

 PC (state)

 JumpType =

next | spin

| fetch | dispatch

| feqz | fnez address

Control ROM data

 PC  PC+1

+1

 PCSrc jump logic zero busy next-state encoding reduces ROM width

37 1/27/2009

Control Signals (17)

CS152Spring’09

Jump Logic

 PCSrc = Case  JumpTypes next spin fetch  dispatch  feqz

 fnez

 PC+1 if (busy) then  PC else  PC+1 absolute op-group if (zero) then absolute else  PC+1 if (zero) then  PC+1 else absolute

CS152Spring’09

1/27/2009 38

Instruction Fetch & ALU:

MIPS-Controller-2

State fetch

0 fetch

1 fetch

2 fetch

3

...

ALU

ALU

0

ALU

1

2

ALUi

0

ALUi

1

ALUi

2

Control points

MA  PC

IR  Memory

A  PC

PC  A + 4

A  Reg[rs]

B  Reg[rt]

Reg[rd]  func(A,B)

A

B

Reg[rs] sExt

16

(Imm)

Reg[rd]  Op(A,B) next-state next spin next dispatch next next fetch next next fetch

CS152Spring’09

1/27/2009 39

Load & Store:

MIPS-Controller-2

State

LW

0

LW

1

LW

2

LW

3

LW

4

SW

SW

SW

SW

0

1

2

SW

3

4

Control points next-state

A  Reg[rs]

B  sExt

16 next

(Imm) next

MA  A+B next

Reg[rt]  Memory spin fetch

A  Reg[rs]

B  sExt next

(Imm) next

MA  A+B

16 next

Memory  Reg[rt] spin fetch

CS152Spring’09

1/27/2009 40

Branches:

MIPS-Controller-2

State

BEQZ

0

BEQZ

1

BEQZ

2

BEQZ

3

BEQZ

4

BNEZ

0

BNEZ

1

BNEZ

2

BNEZ

3

BNEZ

4

Control points next-state

A

A

B

Reg[rs]

PC sExt

16

PC  A+B next fnez next

(Imm<<2) next fetch

A

A

B

Reg[rs]

PC sExt

16

PC  A+B next feqz next

(Imm<<2) next fetch

CS152Spring’09

1/27/2009 41

Jumps:

MIPS-Controller-2

State

J

0

J

1

J

2

JR

0

JR

1

JAL

0

JAL

1

JAL

2

JAL

3

JALR

0

JALR

1

JALR

2

JALR

3

Control points next-state

A  PC

B  IR next next

PC  JumpTarg(A,B) fetch

A  Reg[rs]

PC  A next fetch

A  PC

Reg[31]  A next next

B  IR next

PC  JumpTarg(A,B) fetch

A  PC

B  Reg[rs]

Reg[31]  A

PC  B next next next fetch

CS152Spring’09

1/27/2009 42

VAX 11-780 Microcode

1/27/2009 CS152Spring’09 43

Implementing Complex Instructions

Opcode zero?

ldIR OpSel ldA ldB 32(PC) ldMA busy

2

IR

ExtSel

2 enImm

Imm

Ext

A

ALU control enALU

ALU

B addr

32 GPRs

+ PC ...

3

RegSel

32-bit Reg data

RegWrt enReg

MA addr

Memory

MemWrt data enMem

Bus 32 rd  M[(rs)] op (rt)

M[(rd)]  (rs) op (rt)

M[(rd)]  M[(rs)] op M[(rt)]

1/27/2009 CS152Spring’09

Reg-Memory-src ALU op

Reg-Memory-dst ALU op

Mem-Mem ALU op

44

Mem-Mem ALU Instructions:

MIPS-Controller-2

Mem-Mem ALU op M[(rd)]  M[(rs)] op M[(rt)]

ALUMM

ALUMM

ALUMM

ALUMM

ALUMM

ALUMM

ALUMM

0

1

2

3

4

5

6

MA  Reg[rs]

A  Memory

MA  Reg[rt]

B  Memory next spin next spin

MA  Reg[rd] next

Memory  func(A,B) spin fetch

Complex instructions usually do not require datapath modifications in a microprogrammed implementation

-- only extra space for the control program

Implementing these instructions using a hardwired controller is difficult without datapath modifications

1/27/2009 CS152Spring’09 45

Performance Issues

Microprogrammed control

 multiple cycles per instruction

Cycle time ? t

C

> max(t reg-reg

, t

ALU

, t

 ROM

)

Suppose 10 * t

 ROM

< t

RAM

Good performance, relative to a single-cycle hardwired implementation, can be achieved even with a CPI of 10

CS152Spring’09

1/27/2009 46

Horizontal vs Vertical

Code

Bits per  Instruction

#  Instructions

• Horizontal  code has wider

 instructions

– Multiple parallel operations per  instruction

– Fewer steps per macroinstruction

– Sparser encoding  more bits

• Vertical  code has narrower

 instructions

– Typically a single datapath operation per  instruction

– separate  instruction for branches

– More steps to per macroinstruction

– More compact  less bits

• Nanocoding

– Tries to combine best of horizontal and vertical  code

1/27/2009 CS152Spring’09 47

Nanocoding

Exploits recurring control signal patterns in  code, e.g.,

ALU

0

...

ALUi

...

0

A  Reg[rs]

A  Reg[rs]

 PC (state)

 address

 code ROM nanoaddress nanoinstruction ROM data

 code next-state

• MC68000 had 17-bit  code containing either 10-bit

 jump or

9-bit nanoinstruction pointer

– Nanoinstructions were 68 bits wide, decoded to give 196 control signals

1/27/2009 CS152Spring’09 48

Microprogramming in IBM 360

M30 M40 M50 M65

Datapath width

(bits)

 inst width

(bits)

 code size

(K µinsts)

 store technology

 store cycle

(ns) memory cycle

(ns)

Rental fee

($K/month)

8

50

4

16

52

4

32

85

2.75

CCROS TCROS BCROS BCROS

750

1500

4

625

2500

7

500

2000

15

64

87

2.75

200

750

35

Only the fastest models (75 and 95) were hardwired

1/27/2009 CS152Spring’09 49

Microcode Emulation

• IBM initially miscalculated the importance of software compatibility with earlier models when introducing the

360 series

• Honeywell stole some IBM 1401 customers by offering translation software (“Liberator”) for Honeywell H200 series machine

• IBM retaliated with optional additional microcode for 360 series that could emulate IBM 1401 ISA, later extended for IBM 7000 series

– one popular program on 1401 was a 650 simulator, so some customers ran many 650 programs on emulated 1401s

– ( 650 simulated on 1401 emulated on 360)

CS152Spring’09

1/27/2009 50

Microprogramming thrived in the

Seventies

• Significantly faster ROMs than DRAMs were available

• For complex instruction sets, datapath and controller were cheaper and simpler

• New instructions , e.g., floating point, could be supported without datapath modifications

• Fixing bugs in the controller was easier

• ISA compatibility across various models could be achieved easily and cheaply

1/27/2009

Except for the cheapest and fastest machines, all computers were microprogrammed

CS152Spring’09 51

Writable Control Store (WCS)

• Implement control store in RAM not ROM

– MOS SRAM memories now almost as fast as control store (core memories/DRAMs were 2-10x slower)

– Bug-free microprograms difficult to write

• User-WCS provided as option on several minicomputers

– Allowed users to change microcode for each processor

• User-WCS failed

– Little or no programming tools support

– Difficult to fit software into small space

– Microcode control tailored to original ISA, less useful for others

– Large WCS part of processor state - expensive context switches

– Protection difficult if user can change microcode

– Virtual memory required restartable microcode

CS152Spring’09

1/27/2009 52

Microprogramming: early Eighties

• Evolution bred more complex micro-machines

– Complex instruction sets led to need for subroutine and call stacks in µcode

– Need for fixing bugs in control programs was in conflict with read-only nature of µROM

– --> WCS (B1700, QMachine, Intel i432, …)

• With the advent of VLSI technology assumptions about ROM &

RAM speed became invalid -> more complexity

• Better compilers made complex instructions less important.

• Use of numerous micro-architectural innovations, e.g., pipelining, caches and buffers, made multiple-cycle execution of reg-reg instructions unattractive

• Looking ahead to RISC next time

– Use chip area to build fast instruction cache of user-visible vertical microinstructions - use software subroutine not hardware microroutines

– Use simple ISA to enable hardwired pipelined implementation

1/27/2009 CS152Spring’09 53

Modern Usage

Microprogramming is far from extinct

• Played a crucial role in micros of the Eighties

DEC uVAX, Motorola 68K series, Intel 386 and 486

• Microcode pays an assisting role in most modern micros

(AMD Opteron, Intel Core 2 Duo, Intel Atom, IBM PowerPC)

• Most instructions are executed directly, i.e., with hard-wired control

• Infrequently-used and/or complicated instructions invoke the microcode engine

Patchable microcode common for post-fabrication bug fixes, e.g. Intel processors load µcode patches at bootup

CS152Spring’09

1/27/2009 54

Acknowledgements

• These slides contain material developed and copyright by:

– Arvind (MIT)

– Krste Asanovic (MIT/UCB)

– Joel Emer (Intel/MIT)

– James Hoe (CMU)

– John Kubiatowicz (UCB)

– David Patterson (UCB)

• MIT material derived from course 6.823

• UCB material derived from course CS252

CS152Spring’09

1/27/2009 55

Download