Uploaded by 余頻青

Computer Abstractions & Technology Presentation

advertisement
Yu-Ting Tsai
Department of Computer Science & Engineering
Yuan Ze University, Taiwan
Textbook (Patterson) Chap. 1.1~1.4, 1.6, 1.10
Video (Prof. Liu) L01-01, L01-03~05
Reference (Hennessy) Chap. 1
2
• Computers are pervasive nowadays
CS250A - Assembly Language & Computer Organization
3
• Novel Computer Applications
• Computers in automobiles
• Mobile phones
• Human genome project
• World wide web
• Cloud computing
• …
CS250A - Assembly Language & Computer Organization
4
• Desktop/Laptop (Personal) Computers
– Personal usage with a variety of software
– Subject to cost/performance tradeoff
• Servers
– Modern form of mainframes & supercomputers
– High capacity, performance, & reliability
– Range from small servers to datacenters
CS250A - Assembly Language & Computer Organization
5
• Embedded Computers
– Span the widest range of applications
• Mobile phone, automobile computers, video
game consoles, …, etc.
– Hidden as components of systems
– Stringent power/performance/cost constraints
CS250A - Assembly Language & Computer Organization
6
• General Purpose (PCs & Servers)
– Commercial (integer), scientific (FP, graphics),
home (integer, audio, video, graphics)
– Software compatibility is most important
– Short product life, higher price & profit margin
– Operating system serves another interface
above architecture
CS250A - Assembly Language & Computer Organization
7
• Embedded Computers
– A computer inside another device used for
executing a specific application
– Examples
• Input/Output devices
– Printers, disks, …
• Consumer electronics
– Video game consoles, CD players, PDAs, …
CS250A - Assembly Language & Computer Organization
8
• Embedded Computers
Lego Mindstorms
Robotic command explorer:
A “Programmable Brick”,
Hitachi H8 CPU (8-bit),
32KB RAM, LCD, batteries, infrared
transmitter/receiver,
4 control buttons, 6 connectors
CS250A - Assembly Language & Computer Organization
9
• Embedded Computers
CS250A - Assembly Language & Computer Organization
10
• Embedded Computers
– Large variety in architecture, performance, &
on-chip peripherals
• Compatibility often is not important
• New architecture is easy to enter
• Low power becomes important
– Usually large volume sale at low price
CS250A - Assembly Language & Computer Organization
11
• Application Software
• Written in high-level language (HLL)
• System Software
• Compiler translates HLL code to
machine code
• Operating system (service code)
– Handling input/output, managing
memory & storage, …
• Hardware
• Processor, memory, I/O controllers, …
CS250A - Assembly Language & Computer Organization
12
High-Level
Language Program
Compiler
Assembly Language
Program
Assembler
Machine Language
Program
Machine
Interpretation
Control Signal
Specification
temp = v[0];
v[0] = v[1];
v[1] = temp;
lw
lw
sw
sw
$15,
$16,
$16,
$15,
0($2)
4($2)
0($2)
4($2)
1000 1100 0100 1111 0000 0000 0000 0000
1000 1100 0101 0000 0000 0000 0000 0100
1010 1100 0101 0000 0000 0000 0000 0000
1010 1100 0100 1111 0000 0000 0000 0100
ALUOP[0:3] <= InstReg[9:11] & MASK
…
…
CS250A - Assembly Language & Computer Organization
13
Processor(s)
(Active)
Control
(Brain)
Devices
Memory
(Passive)
Datapath
(Brawn)
CS250A - Assembly Language & Computer Organization
Input
Output
14
Processor(s)
(Active)
Control
(Brain)
Devices
Keyboard,
mouse, …
Memory
(Passive)
Datapath
(Brawn)
Input
Output
Where programs & data
live when running
CS250A - Assembly Language & Computer Organization
Disk (where
programs &
data live
when not
running)
Display,
printer, …
15
• Brains of Computers
– Control unit tells datapath, memory, & I/O
devices what to do
• Decode & dispatch instructions
– Datapath performs arithmetic operations using
arithmetic/logical units (ALUs)
• Based on binary number system
– Cache memory
• Small size memory for fast data access
CS250A - Assembly Language & Computer Organization
16
• Basic Functionality of
Control Unit
Fetch instruction
to which PC points
– Fetch/Execute cycle
• Steps that CPU takes to
execute an instruction
Execute fetched
instruction
– Program counter (PC)
• Holds memory address
of current instruction
CS250A - Assembly Language & Computer Organization
Increment PC
17
• AMD Barcelona (4 Processor Cores)
CS250A - Assembly Language & Computer Organization
18
• AMD Barcelona (4 Processor Cores)
CS250A - Assembly Language & Computer Organization
19
• Volatile Memory
– Lose instructions & data
when powered off
– DRAM, SRAM, …
• Non-Volatile Memory
– Flash memory, hard
drives, optical disks (CD,
DVD, blue disc), …
CS250A - Assembly Language & Computer Organization
20
• Accessories that allow computer to perform
specific tasks
– Receive information for processing
– Return results of processing
– Store information
• Common Input/Output Devices
– Keyboard, mouse, scanner, display, speakers,
printer, hard drive, CD, DVD, …
CS250A - Assembly Language & Computer Organization
21
• Communication & Resource Sharing
– Local area network (LAN):Ethernet, …
– Wide area network (WAN):Internet, …
– Wireless network:WiFi, Bluetooth, LTE, …
CS250A - Assembly Language & Computer Organization
22
• Scope
– Capabilities & performance characteristics of
functional units
• Registers, ALU, shifters, ...
– Ways in which hardware components are
interconnected
• Structure, …
– Information flows between components
• Data, datapath, …
CS250A - Assembly Language & Computer Organization
23
• Scope
– Logic & means by which such information flow
is controlled
– Register transfer level (RTL) description
• A digital system is specified by
– Set of registers
– Operations performed on stored data
– Controllers that supervise sequence of
operations
CS250A - Assembly Language & Computer Organization
24
• Computer Components & Their Relations
– ISA + Computer organization
Applications
Software
Hardware
Operating System
Compiler (Windows, Unix,
Assembler Linux, iOS, …)
Processor Memory I/O system
Datapath & Control
Digital Design
Circuit Design
Transistors
CS250A - Assembly Language & Computer Organization
Instruction set
architecture (ISA)
Computer
organization
25
• An Important Computer Abstraction
– Interface between hardware & low-level
software
– Standardizes instructions, machine language
bit patterns, …, etc.
– May include
• Instruction set & formats
• Modes of memory addressing
• Exceptional conditions
•…
CS250A - Assembly Language & Computer Organization
26
• Advantages
– Different implementations of the same
architecture
• Disadvantages
– Sometimes prevents using new innovations
• Modern Instruction Set Architectures
– x86 series, PowerPC, MIPS, SPARC, ARM, …,
etc.
CS250A - Assembly Language & Computer Organization
27
• Examples
DEC Alpha
(v1, R, B, M, F, C, T)
1992-2001
HP PA-RISC (v1.0, v1.1, v2.0)
1986-1996
Sun SPARC (v7, v8, v9, …)
1987-
SGI MIPS
(MIPS I, II, III, IV, V, …)
x86 Series
(x86-16, IA-32, IA-64,
1978x64, MMX, SSE, AVX, ...)
ARM
(v1, v2, …, v9, …)
1985-
RISC-V
(v1, v2, 20xxxxxx,…)
2010-
CS250A - Assembly Language & Computer Organization
1985-
28
• Instruction Categories
Registers
– Load/Store
R0 - R31
– Computational
– Jump & branch
PC
– Floating point
– Memory management
HI
– Special
LO
• Example Instructions (All 32-bit Wide)
OP
OP
OP
rs
rs
rt
rt
rd
shamt
funct
immediate
jump txxxxarget
CS250A - Assembly Language & Computer Organization
29
• Which airplane has best performance?
Concorde
• Capacity:132 persons
• Range:4000 miles
• Cruising speed:1320 mph
(Mach 2.02) at 60,000 feet
747-400
• Capacity:470 persons
• Range:4150 miles
• Cruising speed:567 mph
(Mach 0.85) at 35,000 feet
CS250A - Assembly Language & Computer Organization
30
• Which airplane has best performance?
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
0
100
200
300
400
500
0
Passenger Capacity
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
500
1000
4000
6000
8000 10000
Cruising Range (miles)
Boeing 777
0
2000
1500
Cruising Speed (mph)
CS250A - Assembly Language & Computer Organization
0
100000 200000 300000 400000
Passengers x mph
31
• Algorithm
– Determines number of operations executed
• Programming Language, Compiler, &
Architecture
– Determine number of machine instructions
executed per operation
CS250A - Assembly Language & Computer Organization
32
• Processor & Memory System
– Determine how fast instructions are executed
• I/O System (including Operating System)
– Determines how fast I/O operations are
executed
CS250A - Assembly Language & Computer Organization
33
• Performance of Computer
1
𝑋:
ExecutionTime𝑋
• Relative Performance
– What does "𝑋 is 𝑛 times faster than 𝑌“ mean?
•𝑛 =
Performance𝑋
Performance𝑌
=
ExecutionTime𝑌
ExecutionTime𝑋
– Example (time taken to run a program)
ExecutionTime
15s
• ExecutionTime𝑌 = 10s = 1.5
𝑋
• 𝑋 is 1.5 times faster than 𝑌
CS250A - Assembly Language & Computer Organization
34
• CPU operations are governed by a constantrate clock
– Clock period:Duration of a clock cycle
• Example:250 ps = 0.25 ns = 250 × 10−12 s
– Clock frequency (rate):Cycles per second
• Example:4.0 GHz = 4000 MHz = 4.0 × 109 Hz
Clock period
Clock (cycles)
Data transfer
& computation
Update state
CS250A - Assembly Language & Computer Organization
35
• Definition
– Time spent on processing a given task
• Discount I/O time & shares of other tasks
• Comprise user CPU time & system CPU time
ClockCycles
– ClockRate
= ClockCycles × ClockCycleTime
CS250A - Assembly Language & Computer Organization
36
• Definition
ClockCycles
– ClockRate
= ClockCycles × ClockCycleTime
• How to Improve Performance?
– Reduce cycle count (number of clock cycles)
– Increase clock rate (or reduce clock cycle time)
– Hardware designer must often trade off clock
rate against cycle count
CS250A - Assembly Language & Computer Organization
37
• Example
– Computer 𝐴:2GHz clock rate, 10s CPU time
– Design computer 𝐵 & aim for 6s CPU time
• Faster clock rate, but with 1.2x clock count
– How fast must clock rate of computer B be?
• ClockCycles𝐴 = CPUTime𝐴 × ClockRate𝐴 =
10 s × 2 GHz
• ClockRate𝐵 =
1.2×10 s×2 GHz
6s
ClockCycles𝐵
CPUTime𝐵
= 4 GHz
CS250A - Assembly Language & Computer Organization
=
1.2×ClockCycles𝐴
CPUTime𝐵
=
38
• Number of Executed C Instructions
– a = b – c;
for(i=a; i>0; i--)
sum = sum + x;
• Number of Executed Machine Instructions
–
sub
Loop: blez
add
addi
j
End:
$r1, $r2, $r3
$r1, $r0, End
$r8, $r8, $r10
$r1, $r1, -1
Loop
CS250A - Assembly Language & Computer Organization
39
• Number of Executed C Instructions
– a = b – c;
for(i=a; i>0; i--)
sum = sum + x;
• Number of Executed Machine Instructions
–
sub
Loop: blez
add
addi
j
End:
$r1, $r2, $r3
$r1, $r0, End
$r8, $r8, $r10 10 times → 42 instructions
$r1, $r1, -1 20 times → 82 instructions
Loop
Dynamic instruction count
CS250A - Assembly Language & Computer Organization
40
• Definition
– Average number of clock cycles that each
instruction takes to execute
– CPI is determined by CPU hardware
– If different instructions have different CPI
• Average CPI is affected by instruction mix
CS250A - Assembly Language & Computer Organization
41
• Instruction Count for a Program
– Determined by program, ISA, & compiler
• Redefine CPU Time
– ClockCycles = InstructionCount × CPI
– CPUTime = ClockCycles × ClockCycleTime
= InstructionCount × CPI × ClockCycleTime
InstructionCount × CPI
=
ClockRate
CS250A - Assembly Language & Computer Organization
42
• Example
– Computer A:CycleTime𝐴 = 250 ps, CPI𝐴 = 2.0
– Computer B:CycleTime𝐵 = 500 ps, CPI𝐵 = 1.2
– Which is faster & by how much (same ISA)?
• CPUTime𝐴 = InstructionCount𝐴 × CPI𝐴 ×
CycleTime𝐴 = 𝐼 × 2.0 × 250 ps
• CPUTime𝐵 = InstructionCount 𝐵 × CPI𝐵 ×
CycleTime𝐵 = 𝐼 × 1.2 × 500 ps
Performance
CPUTime
𝐼×1.2×500 ps
• Performance𝐴 = CPUTime𝐵 = 𝐼×2.0×250 ps = 1.2
𝐵
CS250A - Assembly Language & Computer Organization
𝐴
43
• Definition Revisited
– CPUTime =
Instructions
Program
×
ClockCycles
Instruction
×
Seconds
ClockCycle
• Performance Dependence
Program
Compiler
Instruction Set
Organization
Technology
Instruction Count CPI Clock Rate
×
×
×
×
CS250A - Assembly Language & Computer Organization
×
×
×
×
×
44
• If different instruction classes take different
numbers of cycles
– ClockCycles =
𝑛
𝑖=1 CPI𝑖
× InstructionCount 𝑖
• Weighted Average CPI
– CPI =
=
ClockCycles
TotalInstructionCount
InstructionCount𝑖
𝑛
𝑖=1 CPI𝑖 × TotalInstructionCount
CS250A - Assembly Language & Computer Organization
45
• Example
– Code with instructions in classes 𝐴, 𝐵, 𝐶
• Sequence 1:InstructionCount1 = 5
• ClockCycles1 = 2 × 1 + 1 × 2 + 2 × 3 = 10
• WeightedAverageCPI1 = 10 5 = 2.0
Class
CPI for class
IC in sequence 1
𝐴
𝐵
𝐶
1
2
2
1
3
2
IC in sequence 2
4
1
1
CS250A - Assembly Language & Computer Organization
46
• Example
– Code with instructions in classes 𝐴, 𝐵, 𝐶
• Sequence 2:InstructionCount 2 = 6
• ClockCycles2 = 4 × 1 + 1 × 2 + 1 × 3 = 9
• WeightedAverageCPI2 = 9 6 = 1.5
Class
CPI for class
IC in sequence 1
𝐴
𝐵
𝐶
1
2
2
1
3
2
IC in sequence 2
4
1
1
CS250A - Assembly Language & Computer Organization
47
• Speedup Due to Enhancement 𝐸
– Speedup 𝐸 =
ExecutionTime 𝐸 ′
ExecutionTime 𝐸
=
𝐸 ′ :Without 𝐸
Performance 𝐸
Performance 𝐸 ′
– If 𝐸 accelerates a fraction 𝐹 of task by a factor
𝑆 & remainder of task is unaffected
• ExecutionTime 𝐸 =
• Speedup 𝐸 =
1−𝐹 +
1
𝐹
1−𝐹 +𝑆
≈
CS250A - Assembly Language & Computer Organization
1
1−𝐹
𝐹
𝑆
× ExecutionTime 𝐸 ′
(for 𝑆 → ∞)
48
• Low Power at Idle
– From SPEC power benchmark
100% load 50% load 10% load
Active
power
power
power
idle power
Manufacturer
Processor
HP
Xeon E5440
269 W
227 W
(84%)
174 W
(65%)
160 W
(59%)
Dell
Xeon E5440
276 W
230 W
(83%)
173 W
(63%)
157 W
(57%)
Fujitsu
Seimens
Xeon X3220
132 W
110 W
(83%)
85 W
(65%)
80 W
(60%)
CS250A - Assembly Language & Computer Organization
49
• Low Power at Idle
– Example:Google datacenter
• Mostly operates at 10%~50% load
• Less than 1% of time at 100% load
– We may redesign processors to achieve
power-proportional computing
CS250A - Assembly Language & Computer Organization
50
• Amdahl's Law
– Improve only a portion & expect proportional
improvement in overall performance
• 𝑇improved =
𝑇affected
ImproveFactor
+ 𝑇unaffected
– Corollary
• Make common cases fast
CS250A - Assembly Language & Computer Organization
51
• Amdahl's Law
– Example
• Multiplication operations account for 80s out of
100s
• How much improvement in multiplication in
order to get 5x overall performance?
– 20 =
80
𝑛
+ 20
CS250A - Assembly Language & Computer Organization
This cannot be done!
52
• Basic Computer Organization
• Hierarchy of Computer Abstractions
• Instruction Set Architecture
– Hardware/Software interface
• About Performance
– Execution time, CPI, instruction count,
Amdahl's law, …
• Fallacies & Pitfalls
CS250A - Assembly Language & Computer Organization
Download