COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li

advertisement
COMPUTER ARCHITECTURE &
OPERATIONS I
Instructor: Yaohang Li
Review



Last Class

Program and Computer

Compiler, Assembler, and Linker

Components of a Computer
This Class

Definition of Computer Performance

Measure of Computer Performance
Next Class

Quiz 1

Power Wall

Assignment 1

Which airplane has the best performance?
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
0
100
200
300
400
0
500
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
500
1000
Cruising Speed (mph)
4000
6000
8000 10000
Cruising Range (miles)
Passenger Capacity
0
2000
1500
0
100000 200000 300000 400000
Passengers x mph
§1.4 Performance
An Analogy
Answer

That depends on …

If performance means

“the least time of transferring 1 passenger from one
place to another”


“the least time of transferring 450 passenger from
one place to another”


Concorde
Boeing 747
Performance can be defined in different
ways
Response Time and Throughput

Response time (AKA Execution Time)

Total time required for a computer to complete a task


Measured by time
Throughput (AKA Bandwidth)

Number of tasks done work done per unit time

e.g., tasks/transactions/… per hour
Response Time and Throughput

Assuming each task in a computer is a serial
task. How are response time and throughput
affected by

Replacing with a faster processor?



Adding more processors?



Reduce response time
Increase throughput
Increase throughput
Same response time
We’ll focus on response time for now…
Performance and Execution Time

Performance
Performanc e X  1 Execution time X
Relative Performance

“X is n time faster than Y”
Performanc e X Performanc e Y
 Execution time Y Execution time X  n

Example: time taken to run a program
10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B


Measuring Execution Time

Elapsed (Wallclock) time

Total response time, including all aspects



Processing, I/O, OS overhead, idle time
Determines system performance
CPU time

Time spent processing a given job


Comprises user CPU time and system CPU time



Discounts I/O time, other jobs’ shares
User CPU time: CPU time spent in a program itself
System CPU time: CPU time spent in the OS performing task
on behalf of the program
Different programs are affected differently by CPU
and system performance
CPU Clocking

Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state

Clock period: duration of a clock cycle


e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second

e.g., 4.0GHz = 4000MHz = 4.0×109Hz
CPU Time
CPU Time  CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles

Clock Rate
Performance Improvement

Performance improved by either

Increasing clock rate




Reducing number of clock cycles




=> Shorter clock period
=> More but shorter instructions
=> More clock cycles
=> Longer clock period
=> Less but Longer Instructions
=> Reducing clock rate
Hardware designer must often trade off
clock rate against cycle count
CPU Time Example


A Program on Computer A: 2GHz clock, 10s CPU time
Designing Computer B



Aim for 6s CPU time
Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?
Clock Cycles B 1.2  Clock Cycles A
Clock Rate B 

CPU Time B
6s
Clock Cycles A  CPU Time A  Clock Rate A
 10s  2GHz  20  109
1.2  20  109 24  109
Clock Rate B 

 4GHz
6s
6s
Instruction Set Architecture

Instruction Set Architecture (ISA)

An abstract interface between the hardware
and the lowest-level software that
encompasses all the information necessary to
write a machine language program that will
run correctly




Repertoire of instructions
Registers
Memory access
I/O
Clock Cycles per Instruction (CPI)

Clock Cycles per Instruction (CPI)

Average number of clock cycles per
instruction for a program
Instruction Count and CPI
Clock Cycles  Instructio n Count  Cycles per Instructio n
CPU Time  Instructio n Count  CPI  Clock Cycle Time
Instructio n Count  CPI

Clock Rate

Instruction Count (IC) for a program


Determined by program, ISA and compiler
Average cycles per instruction


Determined by CPU hardware
If different instructions have different CPI

Average CPI affected by instruction mix
CPI Example




Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
CPU Time
CPU Time
A
 Instructio n Count  CPI  Cycle Time
A
A
A is faster…
 I  2.0  250ps  I  500ps
B
 Instructio n Count  CPI  Cycle Time
B
B
 I  1.2  500ps  I  600ps
B  I  600ps  1.2
CPU Time
I  500ps
A
CPU Time
…by this much
CPI in More Detail

If different instruction classes take different
numbers of cycles
n
Clock Cycles   (CPIi  Instructio n Count i )
i1

Weighted average CPI
n
Clock Cycles
Instructio n Count i 

CPI 
   CPIi 

Instructio n Count i1 
Instructio n Count 
Relative frequency
CPI Example


Alternative compiled code sequences using
instructions in classes A, B, C
Class
A
B
C
CPI for class
1
2
3
IC in sequence 1
2
1
2
IC in sequence 2
4
1
1
Sequence 1: IC = 5
Clock Cycles
= 2×1 + 1×2 + 2×3
= 10
Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6


Clock Cycles
= 4×1 + 1×2 + 1×3
=9
Avg. CPI = 9/6 = 1.5


Summary


Response Time and Throughput
Performance Measure



CPI (Cycles per Instruction)
IC (Instructions Count)
Performance Definition
What I want you to do

Review Chapter 1
Download