Instructor: H. H. Ammar
Introduction to Computer Architectures
CpE442 Lec2.1
Overview of Today’s Lecture:
The Role of Performance
° Review from Last Lecture
° Definition and Measures of Performance
° Benchmarks
° Summarizing Performance and Performance
Pitfalls
Introduction to Computer Architectures
CpE442 Lec2.2
Review: What is "Computer Architecture"
° Co-ordination of levels of abstraction
Application
Compiler
Operating
System
Instr. Set Proc.
I/O system
Digital Design
Circuit Design
° Under a set of rapidly changing Forces
Instruction Set
Architecture
Introduction to Computer Architectures
CpE442 Lec2.3
Review: Levels of Representation
High Level Language
Program temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;
Compiler
Assembly Language
Program lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2)
Assembler
Machine Language
Program
0000 1001 1100 0110 1010 1111 0101 1000
1010 1111 0101 1000 0000 1001 1100 0110
1100 0110 1010 1111 0101 1000 0000 1001
0101 1000 0000 1001 1100 0110 1010 1111
Machine Interpretation
Control Signal
Specification
Introduction to Computer Architectures
CpE442 Lec2.4
Review: Levels of Organization
SPARCstation 20
Computer
SPARC
Processor
Control
Memory
Datapath
Devices
Input
Output
Introduction to Computer Architectures
CpE442 Lec2.5
1. The HASE Architecture Simulation
Environment
2. The New Compiler Technology simulation (shown in class)
3. MIPS Assembly Language Simulators a. SPIM A MIPS32 Simulator http://pages.cs.wisc.edu/~larus/spim.html
CpE442 Lec2.6
b. MARS (MIPS Assembler and Runtime Simulator) http://courses.missouristate.edu/kenvollmar/mars/
Introduction to Computer Architectures
Review: Summary from Last Lecture
° All computers consist of five components
• Processor: (1) datapath and (2) control
• (3) Memory
• (4) Input devices and (5) Output devices
° Not all “memory” are created equally
• Cache: fast (expensive) memory are placed closer to the processor
• Main memory: less expensive memory--we can have more
° Input and output (I/O) devices has the messiest organization
• Wide range of speed: graphics vs. keyboard
• Wide range of requirements: speed, standard, cost ... etc.
• Least amount of research (so far)
Introduction to Computer Architectures
CpE442 Lec2.7
Overview of Today’s Lecture:
The Role of Performance
° Review from Last Lecture
° Definition and Measures of Performance
° Benchmarks
° Summarizing Performance and Performance
Pitfalls
Introduction to Computer Architectures
CpE442 Lec2.8
Metrics of performance
CpE442 Lec2.9
Application
Response time, Answers per month
Operations per second
Programming
Language
Compiler
ISA
Datapath
Control
Function Units
Transistors Wires Pins
(millions) of Instructions per second – MIPS
(millions) of (F.P.) operations per second – MFLOP/s
Megabytes per second
Cycles per second (clock rate)
Introduction to Computer Architectures
Relating Processor Metrics the execution time of a given program on a given CPU architecture
° CPU execution time = CPU clock cycles/pgm X clock cycle time
° or CPU execution time = CPU clock cycles/pgm ÷ clock rate
° Define CPI = the avg. clock cycles per instruction, CPI tells us something about the Instruction Set Architecture, the
Implementation of that architecture, and the program being measured
° CPU clock cycles/pgm = Instructions/pgm X CPI
° or CPI = CPU clock cycles/pgm ÷ Instructions/pgm
Introduction to Computer Architectures
CpE442 Lec2.10
Aspects of CPU Performance,
CPU time = Seconds
Program
= Instructions x Cycles x Seconds
Program Instruction Cycle clock rate
Program
Compiler
Instr. Set Arch.
Organization
Technology
CpE442 Lec2.11
instr. count CPI
Introduction to Computer Architectures
Aspects of CPU Performance
CPU time = Seconds
Program
= Instructions x Cycles x Seconds
Program Instruction Cycle clock rate
Program
Compiler
Instr. Set.
Organization
Technology
CpE442 Lec2.12
instr count
X
X
X
CPI
(x)
(x)
X
X X
X
Introduction to Computer Architectures
Figures from a Simulator for the following code segment comparing two compilers for (i=0;i<3;i++) { in_a(i)++; int_b(i)++; flt_d(i) = flt_d(i) + flt_c(i); }
CpE442 Lec2.13
Introduction to Computer Architectures
CpE442 Lec2.14
Introduction to Computer Architectures
CpE442 Lec2.15
Introduction to Computer Architectures
CpE442 Lec2.16
Introduction to Computer Architectures
CpE442 Lec2.17
Introduction to Computer Architectures
Organizational Trade-offs
CpE442 Lec2.18
Application
Programming
Language
Compiler
ISA
Datapath
Control
Function Units
Transistors Wires Pins
Instruction Mix
CPI
Single-Cycle Processor Design
CPI=1, large cycle time-Slow clock
Multi-cycle Processor Design
CPI > 1, smaller cycle time- Faster
Cycle Time clock
Introduction to Computer Architectures
CPI “Average cycles per instruction”
CPI = (CPU Time * Clock Rate) / Instruction Count
= Clock Cycles / Instruction Count
The performance equation can be written as follows using instruction classes
and the instruction count I and CPI for each class i
CPU time = ClockCycleTime * i n
S
CPI * I
= 1 i i
CPI = n
S i = 1
"instruction frequency"
Instruction Count
See example next slide
Invest Resources where time is Spent!
Introduction to Computer Architectures
CpE442 Lec2.19
Example
Base Machine (Reg / Reg)
Op Freq(Fi) CPI(i)
ALU
Load
50%
20%
Store 10%
Branch 20%
Typical Mix
2
2
1
2
% Time
.5
33%
.4
27%
.2
13%
.4
27%
1.5
The CPI = 1.5 cycles per instruction
Introduction to Computer Architectures
CpE442 Lec2.20
Assume a program of 1 million instructions, Compare the performance of
Base Machine (B) with the above CPI, 1 GHZ clock, and
Enhanced Machine (E) with 1.333 GHZ and a one cycle increase for L/S and branch instructions
Enhanced Machine (Reg / Reg)
Op Freq CPI(i)
ALU 50% 1 .5
% Time
25%
Load 20% 3
Store 10% 3
Branch20% 3
.6
.3
.6
2.0
30%
15%
30%
Introduction to Computer Architectures
CpE442 Lec2.21
Speedup due to enhancement E:
ExTime w/o E Performance w/ E
Speedup(E) = -------------------= ---------------------
ExTime w/ E Performance w/o E
= Perf. of E / Perf. of B = exec. Time of B / exec. Time of E
= 1.5 * 1 / 2 * 0.75 = 1
Performance of B is similar to that of E,
No gain in performance
Introduction to Computer Architectures
CpE442 Lec2.22
Rate Metrics –
MIPS (Million Instructions Per Second), and
MFLPOS (Miilions Floating Point Operations Per Second)
MIPS = Instruction Count / (CPU Time * 10^6)
= Clock Rate / (CPI * 10^6)
• machines with different instruction sets ?
• programs with different instruction mixes ?
dynamic frequency of instructions
• uncorrelated with performance
CpE442 Lec2.23
MFLOP/S= FP Operations / (Time * 10^6)
• machine dependent
• often not where time is spent
Introduction to Computer Architectures
Example showing why MIPS can fail
Compare performance with Compilers 1 and 2 for a given program on a given machine
Instruction Count in Billions for instruction classes A B C
Compiler 1 Instruction Count 5 1 1
Compiler 2 Instruction Count 10 1 1
CPI for each class 1 2 3
Clock cycles using compiler1 = 10 Billion
Clock cycles using compiler2 = 15 Billion assuming 1GHZ clock
CPU Time 1 = 5x1+1x2 +1x3 = 10 secs
CPU Time 2 = 10x1 + 1x2 + 1x3 = 15 secs yet the MIPS rating is
MIPS 1 = (instr. Count/cpu time in sec x 10^6)
= (5+1+1)/10 * 1000 = 700
MIPS 2 = 12/15 * 1000 = 800 giving the impression that 2 have a higher rate of executing instructions than 1
Introduction to Computer Architectures
CpE442 Lec2.24
Overview of Today’s Lecture:
The Role of Performance
° Review from Last Lecture
° Definition and Measures of Performance
° Benchmarks
° Summarizing Performance and Performance
Pitfalls
Introduction to Computer Architectures
CpE442 Lec2.25
Why Do Benchmarks?
° How we evaluate differences
• Different systems
• Changes to a single system
° Provide a target
• Benchmarks should represent large class of important programs
• Improving benchmark performance should help many programs
° For better or worse, benchmarks shape a field
° Good ones accelerate progress
• good target for development
° Bad benchmarks hurt progress
• help real programs v. sell machines/papers?
• Inventions that help real programs don’t help benchmark
Introduction to Computer Architectures
CpE442 Lec2.26
Programs to Evaluate Processor Performance
° (Toy) Benchmarks
• 10-100 line
• e.g.,: sieve, puzzle, quicksort
° Synthetic Benchmarks
• attempt to match average frequencies of real workloads
• e.g., Whetstone, dhrystone
° Kernels
• Time critical excerpts Real programs
• e.g., gcc, spice
Introduction to Computer Architectures
CpE442 Lec2.27
Successful Benchmark: SPEC http://www.spec.org/benchmarks.html
http://mrob.com/pub/comp/benchmarks/spec.html#CPU_06
° EE Times + 5 companies band together to form the Systems Performance Evaluation
Committee (SPEC):
Sun, MIPS, HP, Apollo, DEC
° Create standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O
Introduction to Computer Architectures
CpE442 Lec2.28
SPEC second round, SPEC95
•
8 integer benchmarks in C and 10 floating pt benchmarks in Fortran
CpE442 Lec2.29
Introduction to Computer Architectures
CpE442 Lec2.30
Introduction to Computer Architectures
CpE442 Lec2.31
Introduction to Computer Architectures
Overview of Today’s Lecture:
The Role of Performance
° Review from Last Lecture
° Definition and Measures of Performance
° Benchmarks
° Summarizing Performance and Performance
Pitfalls
Introduction to Computer Architectures
CpE442 Lec2.32
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E Performance w/ E
Speedup(E) = -------------------= ---------------------
ExTime w/ E Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then,
ExTime(with E) = ((1-F) + F/S) X ExTime(without E)
Speedup(with E) = ExTime(without E) ÷
((1-F) + F/S) X ExTime(without E)
<= 1/(1-F) speed up is bounded by this factor
Introduction to Computer Architectures
CpE442 Lec2.33
Performance Evaluation Summary
CPU time = Seconds
Program
= Instructions x Cycles x Seconds
Program Instruction Cycle
° Time is the measure of computer performance!
° Good products created when have:
• Good benchmarks
• Good ways to summarize performance
° If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins
° Remember Amdahl’s Law: Speedup is limited by unimproved part of programs
° HW 1, Submit via ecampus
Introduction to Computer Architectures
CpE442 Lec2.34