CS6461 – Computer Architecture Spring 2012 Stephen H. Kaisler, D

advertisement

CS6461 – Computer Architecture

Fall 2015

Instructor Morris Lancaster

Adapted from Professor Stephen Kaisler’s Slides

.

Lecture 2 - Basic System Design

Hierarchical System Architecture

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Technology Trends

• Processor

– logic capacity: 2 x increase in performance every 1.5 - 2 years;

– clock rate: about 25% per year

– overall performance: 1000 x in last decade

• Main Memory

– DRAM capacity: 2 x every 2 years; 1000 x size in last decade

– memory speed: about 10% per year

– cost / bit: improves about 25% per year

Disk

– capacity: > 2 x increase in capacity every 1.5 years

– cost / bit: improves about 60% per year

– 120 x capacity in last decade

– Disk architecture not much different than IBM’s 10 MByte disks of the early 1980s

• Network Bandwidth

– Bandwidth: 1 Gbit/s standard to the desktop in many places

– Bandwidth: Probably 1 Tbit/s b end of decade, but may require new infrastructure

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Intel Processor Evolution

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Processor Clock Speed

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Cost Per GFLOP

CS6461 Computer Architecture - 2014

Dept. of Computer Science

# Servers Comprising WWW

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Technology Progress

Growth factors:

Transistors/chip:

>100,000 since 1971

• Disk density:

• >100,000,000 since 1956

• Disk speed:

12.5 since 1956

The disk speed barrier dominates everything!

45%

40%

35%

30%

25%

20%

15%

10%

5%

0%

Compound Annual Growth Rate

Transistors/Chips since 1971

Disk Density since 1956

Disk Speed since 1956

CS6461 Computer Architecture - 2014

Dept. of Computer Science

The “1,000,000:1” disk-speed barrier

• RAM access times ~5-7.5 nanoseconds

– CPU clock speed <1 nanosecond

– Interprocessor communication can be ~1,000X slower than on-chip

• Disk seek times ~2.5-3 milliseconds

– Limit = ½ rotation

– i.e., 1/30,000 minutes

– i.e., 1/500 seconds = 2 ms

Tiering brings it closer to ~1,000:1 in practice, but even so the difference is VERY BIG

CS6461 Computer Architecture - 2014

Dept. of Computer Science

State of the Art

• State-of-the-art PC (on your desk) now:

– Processor clock speed: ~4 GigaHertz

– Memory capacity: 2 to 8 GigaBytes (Windows 7 limits to 8

GBytes; Windows 8 limits to 128 GBytes on x64 )

– Disk capacity: 1 TByte for <$79; 2 TBytes for <$129 –

Wow!!

– In five years, we will need new units!

• Mega -> Giga -> Tera -> Peta -> Exa (Big Data!)

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Intel 4004 Die Photo

• (2250 transistors, 12 mm2, 108 KHz, 1970)

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Intel 80486 Die Photo

• (1,200,000 transistors, 81 mm2, 25 MHz, 1989)

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Pentium Die Photo

• (3,100,000 transistors; 296 mm2; 60 MHz, 1993)

CS6461 Computer Architecture - 2014

Dept. of Computer Science

I/O System Side

Each bus and adapter has its own specifications.

•Interfaces are where the problems are - between functional units and between the computer and the outside world

•Need to design against constraints of performance, power, area and cost

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Issues

• Performance :

– the key to computing for most intensive problems

– what’s the secret? TIME, TIME, TIME analogy to Real Estate: Location, Location, Location

• Response Time :

– How long does it take for my job/program to run?

– How long does it take to execute my job/program?

[NOTE: These are not equivalent. Why not?]

– How long must I wait for a database query?

• Throughput :

– How many jobs can the machine run at once?

– What is the average execution rate?

– How much work is getting done?

– How long does it take to handle an interrupt?

• Execution Times :

– Elapsed Time: counts everything, disk and memory accesses, I/O waits, etc.

Sometimes, a useful number, but not good for comparison purposes

– CPU Time: counts instruction execution times, but not I/O time; basis for

MIPS/MFLOPS; often divided into system time and user time

• Q?

What are MIPS and MFLOPS good measures of, if anything?

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Let’s start to design the machine for the CS211 CISC Computer!

Reset

Init

Branch

Branch

Taken

Branch

Not Taken

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Analyze LDR/STR Instructions

• From our analysis of LDR/LDA/STR instructions, what do we know?

– Memory Address Register (MAR)

– Memory Buffer Register (MBR)

– Program Counter (PC)

– 4 GPRs (given)

– Instruction Register (IR)

– Register Select Register (RSR)

– Instruction Operation Register (Opcode)

CS6461 Computer Architecture - 2014

Dept. of Computer Science

MAR

R0

R1

R2

R3

How do these hook together?

MBR

Memory

RFI

IR

How many

Registers do I need to access RF?

See Mul/Div instructions

OpCode

How do I

Hook in the

Index Registers?

X1

X2

X3

ALU

PC

Carry

Condition Codes

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Execution Structure

Data1 Data2 Data1 Data2 xRR = result registers, hold result of operation for store on next cycle

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Comments on Multiplexors

• Both the arithmetic unit and the logic unit are “active” and produce outputs.

– The mux determines whether the final result comes from the arithmetic or logic unit.

– The output of the other one is effectively ignored.

• Our hardware scheme may seem like wasted effort, but it’s not really.

– “Deactivating” one or the other wouldn’t save that much time.

– We have to build hardware for both units anyway, so we might as well run them together.

• This is a very common use of multiplexers in logic design.

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Shifter

• A shifter is most useful for arithmetic operations since shifting is equivalent to multiplication by powers of two.

– Shifting is necessary, for example, during floating point operation arithmetic.

• The simplest shifter is the shift register, which can shift by one position per clock cycle.

• So, the number of shifts equals the number of clock cycles consumed.

• Barrel shifter allows rotations as well

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Adder

• The adder is probably the most studied digital circuit.

– There are a great many ways to perform binary addition, each with its own area/delay trade-offs.

– Adder delay is dominated by carry chain.

• Full Adder:

– Computes one-bit sum, carry:

– s i

= a

– c i+1 i

= a

XOR b i b i

+ a i i c i

XOR c i

+ b i c i

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Instruction Path

• Program Counter (PC)

– Keeps track of program execution

– Address of next instruction to read from memory

– May have auto-increment feature or use ALU

• Instruction Register (IR)

– Current instruction

– Includes ALU operation and address of operand

– Also holds target of jump instruction

– Immediate operands

• Relationship to Data Path

– PC may be incremented through ALU or separate adder

– Contents of IR may also be required as input to ALU

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Questions?

• How will you do Scalar Integer Multiply/Divide?

– Just use the Java operators, but must be sure to do it only on 18 bits

– Think about using an Integer subclass with just 18 bits?

• There is no negating instruction. How will you compute the negative of a number?

• Should you use the Adder to increment the PC or just provide a separate adder circuit.

• How will you detect overflow/underflow when doing adding/subtracting?

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Simple Procedure Calls

• Using a procedure involves the following sequence of actions:

1. Put arguments in places known to procedure (registers)

2. Transfer control to procedure, saving the return address (JSR)

3. Acquire storage space, if required, for use by the procedure

4. Perform the desired task

5. Put results in places known to calling program (registers or elsewhere)

6. Return control to calling point (RFS)

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Simple Procedure Calls

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Example: Finding the absolute value of an integer abs jsr abs

….

… jz src src r1,0,pos r0,1,1,1 r0,1,0,1

; assume integer in r0

; instruction after subroutine call str r0,0,<tempInt> ; store r0 in <tempInt>, some location ldr r1,0,smask ; mask for sign bit = 100 000 000 000 000 000 and r1,r0 ; AND r1 and r0: if r0 bit is set it will be set in r1

; test if sign = 0, e.g., r0 bit 0 is 0

; shift r0 logical left 1 bit

; shift r0 logical right – sets sign bit to 0 pos rfs 1

; return with 1 => true and r0 has absolute integer

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Soooo!

• Convoluted?? Yes!

• Why??

1. No jump less than or greater than instructions!

2. Did we really need them or were they a matter of convenience?

E.g., how many instructions did we save by not having them?

3. Implicit use of r3

CS6461 Computer Architecture - 2014

Dept. of Computer Science

Download