Chapter I Introduction

advertisement
INTRODUCTION
Jehan-François Pâris
jparis@uh.edu
An evolving field
• Computer architectures keep changing
– Building faster computers
• Supercomputers and data centers
– Building cheaper, smaller computers
• Laptops, notebooks, netbooks, smartbooks
– Putting computer systems everywhere
• Cars, cell phones, HDTV:
embedded computers
An analogy
• Electrical motors
– Replaced the single steam engine powering
many machines through transmission belts
and pulleys
– One electrical motor per machine
– Domestic appliances, car starters, …
– Power tools
– Power windows, electrical toothbrushes, …
The coming revolution
• Cannot increase CPU clock frequency above
2 GHz without running into unsolvable
heat dissipation problems
– Switch to multicore architectures
• Two, four, eight, … CPUs per chip
– Creates new problems
• Hardware: cache synchronization
• Software: programming these beasts Ouch!
Other challenges
• Reducing power consumption of data centers
– Often contain archival data that are
very rarely accessed
• Finding new ways to keep increasing magnetic
disk capacity
• Dealing with physical limits to SDRAM density
– Will never get 8 TB SODIMM modules
• Finding a replacement for hard drives
Classical computer components
•
•
•
•
•
Input
Output
Memory
Datapath
Control
– Datapath + Control = Processor
• Storage subsystem is missing!
A laptop motherboard
The course philosophy
• Showing you how computer
work is fine
• Showing you how to make
them faster is better!
PERFORMANCE ISSUES
• Defining performance
• Measuring it
– Not an easy task
• Evaluating the impact of
– Amount of work done by each instruction
– Time they take to run
– CPU clock speed
Measuring Performance
• Inverse of execution time of a benchmark
Performance = 1/Execution Time
• If computers A and B are such that
Execution TimeA < Execution TimeB
for the same benchmark, then
PerformanceA > PerformanceB
SPEC CPU Benchmark
• SPEC CPU2006
– Set of 12 integer and 17 floating-point
benchmarks
– Results are normalized:
Execution on a reference processor /
Execution on benchmarked processor
– Single value is geometric mean of these ratios
How is it computed (I)
• Two new processors P and Q compared to
a reference processor R
• Execution times for n benchmarks
– P1, P2, …, Pn
– Q1, Q2, …, Qn
– R1, R2, …, Rn
How it is computed
• SPEC value for processor P is
Ri
SPECP  n i 1
Pi
n
• Observe that
SPECP
n
SPECQ
Qi
i 1 P
i
n
• (property of geometric mean)
Impact of Instruction Set
• Execution Time =
Number of Instructions ×
Mean Instruction Execution Time
– Gave birth to the idea of more complex
instruction sets
• Each does more
• Fewer instructions
Impact of Clock Speed
• Execution Time =
Number of Clock Cycles × Clock Cycle Time
same as
Execution Time =
Number of Clock Cycles / Clock Frequency
Putting everything together
• Execution Time =
Number of Instructions ×
Number of Clock Cycles per Instruction ×
Clock Cycle Time
• Gives us three ways to reduce program
execution time
1. Using fewer instructions
• VAX
– Super minicomputer designed in late 70’s
– Had a complicated instruction set (CISC)
– Idea was to use more powerful instructions in
order to reduce the number of instructions
used to perform most frequent tasks
– Poor pipelining performance
2. Using a faster clock
• Major reason for explosion of CPU performance
in the 80’s and 90’s
– IBM PC (1981):
Intel 8088 @ 4.77 MHz
– IBM PC AT (1984):
Intel 80286 @ 6 and 8 MHz
– Nowadays up to 3 GHz
• Cannot get much higher!
3. Using better instructions
• Best strategy is to reduce the average number of
clock cycles per instruction
– Privileging fast instructions
– Using fixed-size instructions to allow
pipelining
– Trying to execute as many tasks as possible
in parallel
Amdahl’s Law (I)
• Examples:
– Supersonic jet
• Could fly from Houston to Washington in
thirty minutes
• Total travel time would be dominated by
travel time to airport and check in
procedures
– Today's laptops:
• Disk access times are the bottleneck
Amdahl’s Law (II)
• Assume that we have a technique for improving
the performance of some part of a system.
• Let
– To be the time originally spent in the part of
the system that can be improved
– Ti be the time spent in that part once the
improvement has been applied
– Tn be the time spent in in the part of the
system that remains unaffected
Amdahl’s Law (III)
• The total speedup for the whole system will be
Tn  To
Speedup 
Tn  Ti
• The maximum possible speedup when Ti  0
Tn  To
Speedup 
Tn
An example
• Flying to Washington National Airport takes
three hours
• Going to the airport and waiting for the flight
takes a minimum of two hours
• Going from the airport to Washington downtown
takes a minimum of 30 minutes
• What is the maximum speedup that could be
achieved using much faster planes?
5h30 / 2h30 = 2.2
Answer
• Current travel time:
– To airport and wait: 2 hours
– Plane: 3 hours
– To downtown by DC metro: 30 minutes
– Total: 5 hours 30 minutes
Answer
• Assume plane travels at speed of light:
– To airport and wait: 2 hours
– Plane: negligible
– To downtown by DC metro: 30 minutes
– Total: 2 hours 30 minutes
• Maximum speedup would be
5h30 / 2h30 =
2.2
Train and busses
• Commuter trains and city busses spend
significant amount of trip time debarking and
embarking travelers
– Have wide doors
• Not true for Amtrak train and intercity buses
– Fewer narrower doors
Train and busses
A problem
• Assume we have a technique to improve the
speed of floating-point operations by 20 percent
• What will be the overall CPU speedup if we
expect it to spend 10 percent of its time
executing floating point operations?
• How would that speedup be affected if the CPU
spends 30 percent of its time executing floating
point operations?
Solution (I)
• First case:
– Baseline time = 0.9 × 1 + 0.1 × 1 = 1
– After improvement = 0.9 × 1 + 0.1 × 0.8
= 0.98
– Speedup = 1/0.98 = 1.02
• A 2 percent improvement!
Solution (II)
• Second case:
– Baseline time = 0.7 × 1 + 0.3 × 1 = 1
– After improvement = 0.7 × 1 + 0.7 × 0.8
= 0.94
– Speedup = 1/0.94 = 1.064
• A 6.4 percent improvement!
REVIEW PROBLEMS
Problem
•
Consider a huge program that consists of a
purely sequential part that takes two hours and
another part that takes eight hours.
What is the maximum speedup we can
achieve by parallelizing the second part of the
program?
Answer
• Current run time:
– Sequential part: 2 hours
– Other part:
8 hours
– Total:
10 hours
• Minimum run time:
– Sequential part: 2 hours
– Other part:
negligible
– Total:
2 hours
Answer
• Current run time:
– Sequential part: 2 hours
– Other part:
8 hours
– Total:
10 hours
• Minimum run time:
– Sequential part: 2 hours
– Other part:
negligible
– Total:
2 hours
Maximum
speed up
10/2 = 5
Problem
• Server motherboard A has a SPEC CPU2006
rating of 31.4 while server motherboard B has a
rating of 29.7. Which one of the two
motherboards is faster?
Answer
• Server motherboard A has a SPEC CPU2006
rating of 31.4 while server motherboard B has a
rating of 29.7. Which one of the two
motherboards is faster?
• Motherboard A because a higher SPEC
value is better
Fun problem
• Shanghai maglev train runs at 268 mph
• How does it compare to airplane for going
between Houston and Washington, DC?
Fun answer
• Current travel time:
– To airport and wait: 2 hours
– Plane: 3 hours
– To downtown by DC metro: 30 minutes
– Total: 5 hours 30 minutes
• With maglev:
– To station: 1 hour
– Train to downtown DC: 6 hours 30 minutes
– Total: 7 hours 30 minutes
Fun answer
• Current travel time:
– To airport and wait: 2 hours
– Plane: 3 hours
– To downtown by DC metro: 30 minutes
– Total: 5 hours 30 minutes
Plane is still faster
• With maglev:
for very long trips
– To station: one hour
– Train to downtown DC: 6 hours 30 minutes
– Total: 7 hours 30 minutes
Download