CS6461 – Computer Architecture
Fall 2015
Instructor Morris Lancaster
Adapted from Professor Stephen Kaisler’s Slides
.
Lecture 2 - Basic System Design
Hierarchical System Architecture
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Technology Trends
• Processor
– logic capacity: 2 x increase in performance every 1.5 - 2 years;
– clock rate: about 25% per year
– overall performance: 1000 x in last decade
• Main Memory
– DRAM capacity: 2 x every 2 years; 1000 x size in last decade
– memory speed: about 10% per year
– cost / bit: improves about 25% per year
•
Disk
– capacity: > 2 x increase in capacity every 1.5 years
– cost / bit: improves about 60% per year
– 120 x capacity in last decade
– Disk architecture not much different than IBM’s 10 MByte disks of the early 1980s
• Network Bandwidth
– Bandwidth: 1 Gbit/s standard to the desktop in many places
– Bandwidth: Probably 1 Tbit/s b end of decade, but may require new infrastructure
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Intel Processor Evolution
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Processor Clock Speed
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Cost Per GFLOP
CS6461 Computer Architecture - 2014
Dept. of Computer Science
# Servers Comprising WWW
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Technology Progress
• Transistors/chip:
>100,000 since 1971
• Disk density:
• >100,000,000 since 1956
• Disk speed:
12.5 since 1956
The disk speed barrier dominates everything!
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
Compound Annual Growth Rate
Transistors/Chips since 1971
Disk Density since 1956
Disk Speed since 1956
CS6461 Computer Architecture - 2014
Dept. of Computer Science
The “1,000,000:1” disk-speed barrier
• RAM access times ~5-7.5 nanoseconds
– CPU clock speed <1 nanosecond
– Interprocessor communication can be ~1,000X slower than on-chip
• Disk seek times ~2.5-3 milliseconds
– Limit = ½ rotation
– i.e., 1/30,000 minutes
– i.e., 1/500 seconds = 2 ms
Tiering brings it closer to ~1,000:1 in practice, but even so the difference is VERY BIG
CS6461 Computer Architecture - 2014
Dept. of Computer Science
State of the Art
• State-of-the-art PC (on your desk) now:
– Processor clock speed: ~4 GigaHertz
– Memory capacity: 2 to 8 GigaBytes (Windows 7 limits to 8
GBytes; Windows 8 limits to 128 GBytes on x64 )
– Disk capacity: 1 TByte for <$79; 2 TBytes for <$129 –
Wow!!
– In five years, we will need new units!
• Mega -> Giga -> Tera -> Peta -> Exa (Big Data!)
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Intel 4004 Die Photo
• (2250 transistors, 12 mm2, 108 KHz, 1970)
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Intel 80486 Die Photo
• (1,200,000 transistors, 81 mm2, 25 MHz, 1989)
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Pentium Die Photo
• (3,100,000 transistors; 296 mm2; 60 MHz, 1993)
CS6461 Computer Architecture - 2014
Dept. of Computer Science
I/O System Side
Each bus and adapter has its own specifications.
•Interfaces are where the problems are - between functional units and between the computer and the outside world
•Need to design against constraints of performance, power, area and cost
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Issues
• Performance :
– the key to computing for most intensive problems
– what’s the secret? TIME, TIME, TIME analogy to Real Estate: Location, Location, Location
• Response Time :
– How long does it take for my job/program to run?
– How long does it take to execute my job/program?
[NOTE: These are not equivalent. Why not?]
– How long must I wait for a database query?
• Throughput :
– How many jobs can the machine run at once?
– What is the average execution rate?
– How much work is getting done?
– How long does it take to handle an interrupt?
• Execution Times :
– Elapsed Time: counts everything, disk and memory accesses, I/O waits, etc.
Sometimes, a useful number, but not good for comparison purposes
– CPU Time: counts instruction execution times, but not I/O time; basis for
MIPS/MFLOPS; often divided into system time and user time
• Q?
What are MIPS and MFLOPS good measures of, if anything?
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Let’s start to design the machine for the CS211 CISC Computer!
Reset
Init
Branch
Branch
Taken
Branch
Not Taken
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Analyze LDR/STR Instructions
• From our analysis of LDR/LDA/STR instructions, what do we know?
– Memory Address Register (MAR)
– Memory Buffer Register (MBR)
– Program Counter (PC)
– 4 GPRs (given)
– Instruction Register (IR)
– Register Select Register (RSR)
– Instruction Operation Register (Opcode)
CS6461 Computer Architecture - 2014
Dept. of Computer Science
MAR
R0
R1
R2
R3
How do these hook together?
MBR
Memory
RFI
IR
How many
Registers do I need to access RF?
See Mul/Div instructions
OpCode
How do I
Hook in the
Index Registers?
X1
X2
X3
ALU
PC
Carry
Condition Codes
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Execution Structure
Data1 Data2 Data1 Data2 xRR = result registers, hold result of operation for store on next cycle
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Comments on Multiplexors
• Both the arithmetic unit and the logic unit are “active” and produce outputs.
– The mux determines whether the final result comes from the arithmetic or logic unit.
– The output of the other one is effectively ignored.
• Our hardware scheme may seem like wasted effort, but it’s not really.
– “Deactivating” one or the other wouldn’t save that much time.
– We have to build hardware for both units anyway, so we might as well run them together.
• This is a very common use of multiplexers in logic design.
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Shifter
• A shifter is most useful for arithmetic operations since shifting is equivalent to multiplication by powers of two.
– Shifting is necessary, for example, during floating point operation arithmetic.
• The simplest shifter is the shift register, which can shift by one position per clock cycle.
• So, the number of shifts equals the number of clock cycles consumed.
• Barrel shifter allows rotations as well
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Adder
• The adder is probably the most studied digital circuit.
– There are a great many ways to perform binary addition, each with its own area/delay trade-offs.
– Adder delay is dominated by carry chain.
• Full Adder:
– Computes one-bit sum, carry:
– s i
= a
– c i+1 i
= a
XOR b i b i
+ a i i c i
XOR c i
+ b i c i
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Instruction Path
• Program Counter (PC)
– Keeps track of program execution
– Address of next instruction to read from memory
– May have auto-increment feature or use ALU
• Instruction Register (IR)
– Current instruction
– Includes ALU operation and address of operand
– Also holds target of jump instruction
– Immediate operands
• Relationship to Data Path
– PC may be incremented through ALU or separate adder
– Contents of IR may also be required as input to ALU
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Questions?
• How will you do Scalar Integer Multiply/Divide?
– Just use the Java operators, but must be sure to do it only on 18 bits
– Think about using an Integer subclass with just 18 bits?
• There is no negating instruction. How will you compute the negative of a number?
• Should you use the Adder to increment the PC or just provide a separate adder circuit.
• How will you detect overflow/underflow when doing adding/subtracting?
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Simple Procedure Calls
• Using a procedure involves the following sequence of actions:
1. Put arguments in places known to procedure (registers)
2. Transfer control to procedure, saving the return address (JSR)
3. Acquire storage space, if required, for use by the procedure
4. Perform the desired task
5. Put results in places known to calling program (registers or elsewhere)
6. Return control to calling point (RFS)
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Simple Procedure Calls
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Example: Finding the absolute value of an integer abs jsr abs
….
… jz src src r1,0,pos r0,1,1,1 r0,1,0,1
; assume integer in r0
; instruction after subroutine call str r0,0,<tempInt> ; store r0 in <tempInt>, some location ldr r1,0,smask ; mask for sign bit = 100 000 000 000 000 000 and r1,r0 ; AND r1 and r0: if r0 bit is set it will be set in r1
; test if sign = 0, e.g., r0 bit 0 is 0
; shift r0 logical left 1 bit
; shift r0 logical right – sets sign bit to 0 pos rfs 1
…
; return with 1 => true and r0 has absolute integer
CS6461 Computer Architecture - 2014
Dept. of Computer Science
Soooo!
• Convoluted?? Yes!
• Why??
1. No jump less than or greater than instructions!
2. Did we really need them or were they a matter of convenience?
E.g., how many instructions did we save by not having them?
3. Implicit use of r3
CS6461 Computer Architecture - 2014
Dept. of Computer Science