ppt

CSC 2400

Computer Systems I

Lecture 4

Processor Architecture

The Stored Program

Computer

The Stored Program Computer









1943: ENIAC

–

–

Presper Eckert and John Mauchly -- first general electronic computer.

(or was it John V. Atanasoff in 1939?)

Hard-wired program -- settings of dials and switches.

1944: Beginnings of EDVAC

– among other improvements, includes program stored in memory

1945: John von Neumann

– wrote a report on the stored program concept, known as the First Draft of a Report on EDVAC

The basic structure proposed in the draft became known as the “von Neumann machine” (or model).

–

–

– a memory , containing instructions and data a processing unit , for performing arithmetic and logical operations a control unit , for interpreting instructions

For more history, see http://www.maxmon.com/history.htm

3

Von Neumann Model

MAR MDR

PC IR

4

Memory







2 k x m array of stored bits

Address

– unique ( k -bit) identifier of location

Contents

– m -bit value stored in location







Basic Operations:

LOAD

– read a value from a memory location

STORE

– write a value to a memory location

0000

0001

0010

0011

0100

0101

0110

1101

1110

1111

00101101

10100010

5

Interface to Memory







How does processing unit get data to/from memory?

MAR : Memory Address Register

O Y

MDR : Memory Data Register





1.

2.

3.

To LOAD a location (A):

Write the address (A) into the MAR.

Send a “read” signal to the memory.

Read the data from MDR.

1.

2.

3.

To STORE a value (X) to a location (A):

Write the data (X) to the MDR.

Write the address (A) into the MAR.

Send a “write” signal to the memory.

6

ALU or Processing Unit







Functional Units

–

–

ALU = Arithmetic and Logic Unit could have many functional units.

some of them special-purpose

(multiply, square root, …)

Registers

–

–

Small, temporary storage

Operands and results of functional units

Word Size

–

– number of bits normally processed by ALU in one instruction also width of registers

7

Input and Output

 Devices for getting data into and out of computer memory

 Each device has its own interface, usually a set of registers like the memory’s MAR and MDR

–

–

INPUT

Keyboard

M ouse

Scanner

Disk keyboard: data register (KBDR) and status register (KBSR) monitor: data register (DDR) and status register (DSR)

OUTPUT

M onitor

Printer

LED

Disk





Some devices provide both input and output

– disk, network

Program that controls access to a device is usually called a driver .

8

Control Unit

 Orchestrates execution of the program

CONTROL UNIT

PC IR







Instruction Register (IR) contains the current instruction .

Program Counter (PC) contains the address of the next instruction to be executed.

Control unit :

–

– reads an instruction from memory

 the instruction’s address is in the PC interprets the instruction, generating signals that tell the other components what to do

 an instruction may take many machine cycles to complete

9

Logic Design

Overview of Logic Design





Fundamental Hardware Requirements

–

–

–

Communication

 How to get values from one place to another

Computation

Storage

Bits are Our Friends

–

–

–

–

Everything expressed in terms of values 0 and 1

Communication

 Low or high voltage on wire

Computation

 Compute Boolean functions

Storage

 Store bits of information

11

Digital Signals

0 1 0

Voltage

–

–

–

Time

Use voltage thresholds to extract discrete values from continuous signal

Simplest version: 1-bit signal





Either high range (1) or low range (0)

With guard range between them

Not strongly affected by noise or low quality circuit elements

 Can make circuits simple, small, and fast

12

Computing with Logic Gates

Not And a b out = a && b out

Or a b out = a || b out

–

–

Outputs are Boolean functions of inputs

Respond continuously to changes in inputs

 With some, small delay a out = !

a out

Rising Delay Falling Delay a && b b

Voltage a

Time

13

Bit Equality

Bit equal a

HCL Expression eq bool eq = (a&&b)||(!a&&!b) b



– Generate 1 if a and b are equal

Hardware Control Language (HCL)

–

–

Very simple hardware description language

 Boolean operations have syntax similar to C logical operations

We’ll use it to describe control logic for processors

14

Word Equality

b

31 a

31 b

30 a

30

Bit equal eq

31

Bit equal eq

30 b

1 a

1 b

0 a

0

Bit equal eq

1

Bit equal eq

0

Word-Level Representation

B

Eq

=

A

Eq

HCL Representation bool Eq = (A == B)

–

–

32-bit word size

HCL representation





Equality operation

Generates Boolean value

15

1-Bit Latch

D Latch

D

Data

R

Q+

Q –

C

Clock

S

Latching d !d

1

!d

!d

d d d !d

Q – d

Storing

!d

0

!q

q

0

0 q

!q

Q –

16

Registers

i

2 i

1 i

4 i

3 i

0 i

7 i

6 i

5

Structure

D

C

D

C

D

C

D

C

D

C

D

C

D

C

D

C

Q+

Q+

Q+

Q+

Q+

Q+

Q+

Q+ o o o o o o o o

7

6

5

4

3

2

1

0

I

Clock

Clock

–

–

– Stores word of data

 Different from program registers seen in assembly code

Collection of edge-triggered latches

Loads input on rising edge of clock

O

17

Random-Access Memory

valA

A srcA

Read ports

Register file valW

W dstW

Write port valB

B srcB

–

–

–

Clock

Stores multiple words of memory

 Address input specifies which word to read or write

Register file





Holds values of program registers

%eax , %esp , etc.

 Register identifier serves as address

– ID 8 implies no read or write performed

Multiple Ports

 Can read and/or write multiple words in one cycle

– Each has separate address and data input/output

18

Basic Logic Gates

NOTE: okay to use just a circle for NOT : 

19

More than 2 Inputs?

 AND/OR can take any number of inputs.

–

–

–

AND = 1 if all inputs are 1.

OR = 1 if any input is 1.

Similar for NAND/NOR.

 Can implement with multiple two-input gates

20

Logical Completeness

 Can implement ANY truth table with AND, OR, NOT.

A B C D

0 0 0 0

0 0 1 0

0 1 0 1

1. AND combinations that yield a "1" in the truth table.

0 1 1 0

1 0 0 0

1 0 1 1

1 1 0 0

1 1 1 0

2. OR the results of the AND gates.

21

Practice

 Implement the following truth table.

A B C

0 0 0

0 1 1

1 0 1

1 1 0

22

DeMorgan's Law





Converting AND to OR (with some help from NOT)

Consider the following gate:

A B A B

0 0 1 1

0 1 1 0

1 0 0 1

1 1 0 0

A  B

1

0

0

0

A  B

0

1

1

1

To convert AND to OR

(or vice versa), invert inputs and output

.

23

Decoder

 n inputs, 2 n outputs

– exactly one output is 1 for each possible input pattern

2-bit decoder

24

Programming Wisdom

Solving Problems using a Computer

 Methodologies for creating computer programs that perform a desired function.





Problem Solving

–

–

How do we figure out what to tell the computer to do?

Convert problem statement into algorithm, using stepwise refinement .

Debugging

–

How do we figure out why it didn’t work?

– Examining registers and memory, setting breakpoints, etc.

Time spent on the first can reduce time spent on the second!

26

Stepwise Refinement

 Also known as systematic decomposition .

 Start with problem statement:

“We wish to count the number of occurrences of a character in a file. The character in question is to be input from the keyboard; the result is to be displayed on the monitor.”

 Decompose task into a few simpler subtasks .

 Decompose each subtask into smaller subtasks , and these into even smaller subtasks , etc....

until you get to the machine instruction level.

27

Problem Statement

 Because problem statements are written in English, they are sometimes ambiguous and/or incomplete.

–

Where is “file” located? How big is it, or how do I know when I’ve reached the end?

–

–

How should final count be printed? A decimal number?

If the character is a letter, should I count both upper-case and lower-case occurrences?

 How do you resolve these issues?

–

–

Ask the person who wants the problem solved, or

Make a decision and document it.

28

Three Basic Constructs

 There are three basic ways to decompose a task:

Task

Subtask 1

Subtask 2

Sequential

True Test condition

False

Subtask 1 Subtask 2

Conditional

Test condition

True

Subtask

False

Iterative

29

Sequential

 Do Subtask 1 to completion, then do Subtask 2 to completion, etc.

Count and print the occurrences of a character in a file

Get character input from keyboard

Examine file and count the number of characters that match

Print number to the screen

30

Conditional

 If condition is true, do Subtask 1; else, do Subtask 2.

Test character.

If match, increment counter.

True file char

= input?

False

Count = Count + 1

31

Iterative

 Do Subtask over and over, as long as the test condition is true.

Check each element of the file and count the characters that match.

more chars to check?

True

False

Check next char and count if matches.

32

Problem Solving Skills

 Learn to convert problem statement into step-by-step description of subtasks.

–

Like a puzzle, or a “word problem” from grammar school math.







What is the starting state of the system?

What is the desired ending state?

How do we move from one state to another?

– Recognize English words that correlate to three basic constructs:





“do A then do B”  sequential

“ if

G, then do H”  conditional





“ for each X, do Y”  iterative

“do Z until W”  iterative

33

Example: Counting Characters

START

START

Input a character. Then scan a file, counting occurrences of that character. Finally, display on the monitor the number of occurrences of the character (up to 9).

A

Initialize: Put initial values into all locations that will be needed to carry out this task.

- Input a character.

- Set up a pointer to the first location of the file that will be scanned.

- Get the first character from the file.

- Zero the register that holds the count.

STOP

B

Scan the file, location by location, incrementing the counter if the character matches.

C

Display the count on the monitor.

Initial refinement: Big task into three sequential subtasks.

STOP

34

Refining B1

B

Scan the file, location by location, incrementing the counter if the character matches.

B

Yes

Done?

No

B1

Test character. If a match, increment counter. Get next character.

Refining B into iterative construct.

35

Refining B1

B

Yes

Done?

No

B1

Test character. If a match, increment counter. Get next character.

Refining B1 into sequential subtasks.

Yes

Done?

No

B1

B2 Test character. If matches, increment counter.

B3 Get next character.

36

Refining B2 and B3

Yes

Done?

No

B1

B2 Test character. If matches, increment counter.

B3 Get next character.

B2

Yes

Done?

No

Yes

R1 = R0?

R2 = R2 + 1

No

B3

R3 = R3 + 1

R1 = M[R3]

Conditional (B2) and sequential (B3).

37

The Last Step: Instructions

 Write code (C, assembly, Java) for each step

B2

Yes

Done?

No

Yes

R1 = R0?

R2 = R2 + 1

No

B3

R3 = R3 + 1

R1 = M[R3]

; Look at each char in file.

0001100001111100 ; is R1 = EOT?

0000010xxxxxxxxx ; if so, exit loop

; Check for match with R0.

1001001001111111 ; R1 = -char

0001001001100001

0001001000000001 ; R1 = R0 – char

0000101xxxxxxxxx ; no match, skip incr

0001010010100001 ; R2 = R2 + 1

; Incr file ptr and get next char

0001011011100001 ; R3 = R3 + 1

0110001011000000 ; R1 = M[R3]

Don’t know

PCoffset bits until all the code is done

38

Types of Errors in Code







Syntax Errors

–

–

–

You made a typing error that resulted in an illegal operation.

Not usually an issue with machine language, because almost any bit pattern corresponds to some legal instruction.

In high-level languages, these are often caught during the translation from language to machine code.

Logic Errors

–

–

Your program is legal, but wrong, so the results don’t match the problem statement.

Trace the program to see what’s really happening and determine how to get the proper behavior.

Data Errors

–

–

Input data is different than what you expected.

Test the program with a wide variety of inputs.

39

Instruction Set Architecture

Instruction





The instruction is the fundamental unit of work.

Specifies two things:

–

– opcode : operation to be performed operands : data/locations to be used for operation

 An instruction is encoded as a sequence of bits.

(Just like data!)

–

–

–

Often, but not always, instructions have a fixed length, such as 16 or 32 bits.

Control unit interprets instruction: generates sequence of control signals to carry out operation.

Operation is either executed completely, or not at all.



A computer’s instructions and their formats is known as its

Instruction Set Architecture (ISA) .

41

Instruction Set Architecture





Assembly Language View

–

–

Processor state



Registers, memory, …

Instructions

 addl , movl , leal , …

 How instructions are encoded as bytes

Layer of Abstraction

–

–

Above: how to program machine

 Processor executes instructions in a sequence

Below: what needs to be built

 Use variety of tricks to make it run fast

 E.g., execute multiple instructions simultaneously

Application

Program

Compiler OS

ISA

CPU

Design

Circuit

Design

Chip

Layout

42

Instruction Set Design Issues

 Instruction set design issues include:

–

–

–

–

–

Where are operands stored?

 registers, memory, stack, accumulator

How many explicit operands are there?

 0, 1, 2, or 3

How is the operand location specified?

 register, immediate, indirect, . . .

What type & size of operands are supported?

 byte, int, float, double, string, vector. . .

What operations are supported?

 add, sub, mul, move, compare . . .

43

Instruction Set

Architectures

Basic ISA Classes

The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B .

Stack Accumulator Register-Memory Load-Store

Push A

Push B

Add

Pop C

Load A

Add B

Store C

Load R1, A

Add R1, B

Store C, R1

Load R1, A

Load R2, B

Add R3, R1, R2

Store C, R3

Load-Store is the class that won out. The more registers on the CPU, the better.

44

Types of Addressing Modes

Addressing Mode

1. Register

2. Immediate

3. Displacement

4. Register indirect

5. Indexed

6. Direct or absolute

7. Memory Indirect

8. Autoincrement

9. Autodecrement

10. Scaled



Example

Add R4, R3

Action

R4 <- R4 + R3

Add R4, #3 R4 <- R4 + 3

Add R4, 100(R1) R4 <- R4 + M[100 + R1]

Add R4, (R1)

Add R4, (R1 + R2)

Add R4, (1000)

Add R4, @(R3)

Add R4, (R2)+

Add R4, (R2)-

R4 <- R4 + M[R1]

R4 <- R4 + M[R1 + R2]

R4 <- R4 + M[1000]

R4 <- R4 + M[M[R3]]

R4 <- R4 + M[R2]

R2 <- R2 + d

R4 <- R4 + M[R2]

Add R4, 100(R2)[R3]

R2 <- R2 - d

R4 <- R4 +

M[100 + R2 + R3*d]

Modes 1-4 account for 93% of all operands

45

Types of Operations

 Arithmetic and Logic: AND, ADD





Data Transfer:

Control

MOVE, LOAD, STORE

BRANCH, JUMP, CALL











System

Floating Point

Decimal

String

Graphics

OS CALL, VM

ADDF, MULF, DIVF

ADDD, CONVERT

MOVE, COMPARE

(DE)COMPRESS

46

Role of Compilers

 What does the compiler do?

– Translate HLL to machine lang, optimize, check for errors

 Optimizations

–

–

–

–

–

Generic high-level: common subexpression, strength reduction,

“machine independent”

Local: within a straightline code fragment (a “block”)

Global: cross branches, transform loops

Register allocation: associate registers with operands

Machine-dependent: tune to the specific architecture (or ISA)

47

Role of Compilers (cont’d)

 Impact of optimization on performance



– Goal is to improve

– Sometimes makes worse, or not better

How to make the compiler writer’s life easier

–

–

–

–

Make frequent case fast, rare case correct

Make things uniform

Reduce tradeoffs, have one “best” way of doing each thing

Allow for constant values

48

CISC Instruction Sets









–

–

Complex Instruction Set Computer

Dominant style through mid80’s

Stack-oriented instruction set

–

–

Use stack to pass arguments, save program counter

Explicit push and pop instructions

Arithmetic instructions can access memory

– addl %eax, 12(%ebx,%ecx,4)



 requires memory read and write

Complex address calculation

Condition codes

– Set as side effect of arithmetic and logical instructions

Philosophy

–

Add instructions to perform “typical” programming tasks

49

RISC Instruction Sets







–

–

Reduced Instruction Set Computer

Internal project at IBM, later popularized by Hennessy (Stanford) and

Patterson (Berkeley)

Fewer, simpler instructions

–

–

Might take more to get given task done

Can execute them with small and fast hardware

Register-oriented instruction set

–

–

Many more (typically 32) registers

Use for arguments, return pointer, temporaries

Only load and store instructions can access memory

– Similar to Y86 mrmovl and rmmovl

 No Condition codes

– Test instructions return 0/1 in register

50

Example RISC Instruction Formats

Register-Register

31

Op

26 25 rs1

(R-type)

21 20 rs2

16 15

ADD R1, R2, R3

11 10 6 5 rd func

0

(ALI reg. operations, read/write special registers and moves)

Register-Immediate (I-type)

31 26 25 21 20 16 15

SUB R1, R2, #3

Op rs1 rd immediate

0

(ALU imm. operations, loads and stores, conditional branch, jump (and link)

Jump / Call (J-type)

31 26 25

Op

JUMP end offset added to PC

(jump, jump and link, trap and return from exception)

0

51

CISC vs. RISC





Original Debate

–

–

–

Strong opinions!

CISC proponents---easy for compiler, fewer code bytes

RISC proponents---better for optimizing compilers, can make run fast with simple chip design

Current Status

–

–

For desktop processors, choice of ISA not a technical issue

 With enough hardware, can make anything run fast

 Code compatibility more important

For embedded processors, RISC makes sense

 Smaller, cheaper, less power

52

Sequential Processors

Instruction Processing

Fetch instruction from memory

Decode instruction

Evaluate address

Memory load or store

Write back result

Update Program Counter

54

newPC

Sequential HW Structure

PC

Write back





State

– Program counter register (PC)

–

–

–

Condition code register (CC)

Register File

Memories







Access same memory space

Data: for reading/writing program data

Instruction: for reading instructions

Memory

Execute

Instruction Flow

–

–

–

Read instruction at address specified by

PC

Process through stages

Decode

Update program counter icode , ifun rA , rB valC valE, valM valM

Addr, Data

Bch aluA, aluB srcA, srcB dstA, dstB valE valA, valB valP

Fetch

PC

55

Seqential Stages













Fetch

– Read instruction from instruction memory

Decode

– Read program registers

Execute

– Compute value or address

Memory

– Read or write data

Write Back

– Write program registers

PC

– Update program counter

PC

Write back newPC valE, valM valM

Memory

Addr, Data valE

Execute Bch aluA, aluB valA, valB

Decode icode , ifun rA , rB valC

Fetch srcA, srcB dstA, dstB valP

PC

56

Instruction Cycle

Phases

Instruction

Fetch

IR

Instr. Decode

Reg. Fetch

Execute

Addr.

Calc

Memory

Access

Write

Back

Passed To Next Stage

IR <- Mem[PC]

NPC <- PC + 4

L

M

D

Instruction Fetch (IF) :

Send out the PC and fetch the instruction from memory into the instruction register (IR); increment the PC by 4 to address the next sequential instruction.

IR holds the instruction that will be used in the next stage.

NPC holds the value of the next PC.

57


Phases

Instruction

Fetch

IR

Instr. Decode

Reg. Fetch

Execute

Addr.

Calc

Memory

Access

Write

Back

L

M

D


A <- Regs[IR6..IR10];

B <- Regs[IR10..IR15];

Imm <- ((IR16) ##IR16-31

Instruction Decode / Register Fetch (ID) :

Decode the instruction and access the register file to read the registers.

The outputs of the general purpose registers are read into two temporary registers (A & B) for use in later clock cycles.

We extend the sign of the lower 16 bits of the Instruction Register.

58

Instruction Decoding

Optional Optional

5 0 rA rB D

 icode ifun rA rB valC

Instruction Format

–

–

–

Instruction byte

Optional register byte

Optional constant word icode:ifun rA:rB valC

59


Phases

Instruction

Fetch

IR

Instr. Decode

Reg. Fetch

Execute

Addr.

Calc

Memory

Access

Write

Back


A <- A func. B cond = 0;

L

M

D

Execute / Address Calculation (EX) :

We perform an operation (for an ALU) or an address calculation (if it’s a load or a Branch).

If an ALU, actually do the operation. If an address calculation, figure out how to obtain the address and stash away the location of that address for the next cycle.

60


Phases

Instruction

Fetch

IR

Instr. Decode

Reg. Fetch

Execute

Addr.

Calc

Memory

Access

Write

Back

L

M

D


A = Mem[prev. B] or

Mem[prev. B] = A

Memory Access (MEM) :

If this is an ALU, do nothing.

If a load or store, then access memory.

61


Phases

Instruction

Fetch

IR

Instr. Decode

Reg. Fetch

Execute

Addr.

Calc

Memory

Access

Write

Back

L

M

D


Regs <- A, B;

PC <- NPC

Write Back (WB) :

Update the registers from either the ALU or from the data loaded.

62

Sequential Summary





Implementation

–

–

–

–

Express every instruction as series of simple steps

Follow same general flow for each instruction type

Assemble registers, memories, predesigned combinational blocks

Connect with control logic

Limitations

–

–

–

–

Too slow to be practical

In one cycle, must propagate through instruction memory, register file,

ALU, and data memory

Would need to run clock very slowly

Hardware units only active for fraction of clock cycle

63

Pipelined Processors

What is Pipelining











Computers execute billions of instructions, so instruction throughput is what matters

IDEA: Divide instruction execution up into several pipeline stages. For example

IF ID EX MEM WB

Simultaneously have different instructions in different pipeline stages

The length of the longest pipeline stage determines the cycle time

Desirable pipeline features (e.g., RISC):

– all instructions same length

–

– registers located in same place in instruction format memory operands only in loads or stores

65

What Is Pipelining

Laundry Example

 A nn, B rian, C athy, D ave each have one load of clothes to wash, dry, and fold

 Washer takes 30 minutes

 Dryer takes 40 minutes



“Folder” takes 20 minutes

A B C D

66

What Is Pipelining

6 PM 7 8 9

Time

10 11 Midnight

30 40 20 30 40 20 30 40 20 30 40 20 s k

T a

A

B d e r

O r

C

D

Sequential laundry takes 6 hours for 4 loads

If they learned pipelining, how long would laundry take?

67

O r d e r

T a s k

What Is

Pipelining

Start work ASAP

6 PM 7 8 9

Time

10 11 Midnight

A

30 40 40 40 40 20

 Pipelined laundry takes 3.5 hours for 4 loads

B

C

D

68

O r d e r s k

T a

What Is

Pipelining

Pipelining

Lessons

A

6 PM 7 8 9

Time

30 40 40 40 40 20

B













Pipelining doesn’t help latency of single task, it helps throughput of entire workload

Pipeline rate limited by slowest pipeline stage

Multiple tasks operating simultaneously

Potential speedup = Number pipe stages

Unbalanced lengths of pipe stages reduces speedup

Time to “ fill ” pipeline and time to “ drain ” it reduces speedup

C

D

69

Real-World Pipelines: Car Washes

Sequential Parallel

Pipelined

 Idea

–

–

–

Divide process into independent stages

Move objects through stages in sequence

At any given time, multiple objects being processed

70

Pipeline Diagrams

 Unpipelined

OP1

OP2

OP3

Time



– Cannot start new operation until previous one completes

3-Way Pipelined

OP1

OP2

OP3

A B

A

C

B

A

C

B C

Time

– Up to 3 operations in process simultaneously

71

Pipelining has issues!



Nonuniform delays – unpredictable reading from memory

 Structural hazards: Not enough HW to support this combination of instructions (single person to fold and put clothes away)

 Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock)

 Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).

72

Data Dependencies

Combinational logic

R e g

Clock

OP1

OP2

OP3

Time

 System

– Each operation depends on result from preceding one

73

Data Hazards

Comb.

logic

A

R e g

Comb.

logic

B

R e g

Comb.

logic

C

R e g

A B

A

C

B

A

Clock

OP1

OP2

OP3

OP4

C

B

A

C

B C

Time

–

–

Result does not feed back around in time for next operation

Pipelining has changed behavior of system

74

One Memory Port/Structural Hazards

Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 d e

O r r s t

I n r.

Load Ifetch

Instr 1

Instr 2

Instr 3

Instr 4

Reg

Ifetch Reg

Ifetch

DMem

Reg

Ifetch

Reg

DMem

Reg

Ifetch

Reg

DMem

Reg

Reg

DMem Reg

DMem Reg

75

Three Generic Data Hazards

 Read After Write (RAW)

Instr

J tries to read operand before Instr

I writes it

I: add r1,r2,r3

J: sub r4,r1,r3



Caused by a “Dependence” (in compiler nomenclature).

This hazard results from an actual need for communication.

76


 Write After Read (WAR)

Instr

J writes operand before Instr

I: sub r4,r1,r3

J: add r1,r2,r3

K: mul r6,r1,r7

I reads it



Called an “anti-dependence” by compiler writers.

This results from reuse of the name “r1”.

77


 Write After Write (WAW)

Instr

J writes operand before Instr

I writes it.

I: sub r1,r4,r3

J: add r1,r2,r3

K: mul r6,r1,r7



Called an “output dependence” by compiler writers

This also results from the reuse of name “r1”.

78

Control Hazards

10: beq r1,r3,36

Ifetch Reg DMem Reg

14: and r2,r3,r5

18: or r6,r1,r7

Ifetch Reg

Ifetch Reg

DMem Reg

DMem Reg

22: add r8,r1,r9

36: xor r10,r1,r11

What do you do with the 3 instructions in between?

Ifetch Reg

Ifetch Reg

DMem Reg

DMem Reg

79

Hands-on Example: timing diagram

 Write sequential timing diagram for: instr x = y + z b = x + y y = a + b d = z + b x = a + y

1

F

2 3 4 5

D EX M W

6 7 8 9

 Rewrite using forwarding, compare total time

 Rewrite using scheduling, compare total time

80