Clip 7 - Moodle de l'EPFL.CH

advertisement
ICC Module 3 Lesson 1 – Computer Architecture
© 2015 Ph. Janson
Information, Computing & Communication
Computer Architecture
Clip 7 – Architectural Parallelism
School of Computer Science & Communications
P. Ienne (charts), Ph. Janson (commentary)
1/9
ICC Module 3 Lesson 1 – Computer Architecture
© 2015 Ph. Janson
Outline
►Clip 0 – Introduction
►Clip 1 – Software technology – Assembler language
 Algorithms
 Registers
 Data instructions
 Instruction numbering
 Control instructions
►Clip 2 – Hardware architecture – Von Neumann’s stored program computer architecture
 Data storage and processing
 Control storage and processing
►Clip 3 – Hardware design – Instruction encoding
►Harware implementation – Transistor technology
 Clip 4 – Computing circuits
 Clip 5 – Memory circuits
►Hardware performance
 Clip 6 – Logic parallelism
 Clip 7 – Architecture parallelism
First clip
Previous clip
Next clip
2/9
ICC Module 3 Lesson 1 – Computer Architecture
© 2015 Ph. Janson
How can one increase performance beyond transistor speed ?
= Reduce delay
= Increase throughput
waiting to get a result
number of results per time unit
t
t
Two simple examples of performance increase:
1.
At the circuit level
Reducing the delay of an adder
2.
At the processor structure level
Increasing the throughput of instructions
=> this clip
3/9
ICC Module 3 Lesson 1 – Computer Architecture
© 2015 Ph. Janson
Our processor …
103:
104:
105:
106:
107:
108:
109:
110:
111:
112:
113:
114:
115:
load
load
add
mult
sub
load
add
sub
load
add
add
div
load
r1,
r2,
r3,
r2,
r8,
r9,
r3,
r5,
r2,
r1,
r8,
r4,
r2,
0
-21
r7,
r5,
r7,
r4
r2,
r3,
r3
r2,
r1,
r1,
r4
r4
r9
r9
… executes normally
one instruction at a time
r1
r4
-1
-1
r7
Can we do better ?
Load
Sub
Mult
Add
Load
Load
r9, r4
r8, r7, r9
r2, r5, r9
r3, r7, r4
r2, -21
r1, 0
Arithm.
unit
4/9
ICC Module 3 Lesson 1 – Computer Architecture
© 2015 Ph. Janson
Doubling the throughput of our processor
103:
104:
105:
106:
107:
108:
109:
110:
111:
112:
113:
114:
115:
load
load
add
mult
sub
load
add
sub
load
add
add
div
load
r1,
r2,
r3,
r2,
r8,
r9,
r3,
r5,
r2,
r1,
r8,
r4,
r2,
0
-21
r7,
r5,
r7,
r4
r2,
r3,
r3
r2,
r1,
r1,
r4
r4
r9
r9
We could imagine executing
two instructions at a time!
r1
r4
-1
-1
r7
Do you see the problem ?!
Sub
Add
Load
r8, r7, r9
r3, r7, r4
r1, 0
Arithm.
unit
Load
Mult
Load
r9, r4
r2, r5, r9
r2, -21
Arithm.
unit
5/9
ICC Module 3 Lesson 1 – Computer Architecture
© 2015 Ph. Janson
Doubling the throughput of our processor
103:
104:
105:
106:
107:
108:
109:
110:
111:
112:
113:
114:
115:
load
load
add
mult
sub
load
add
sub
load
add
add
div
load
r1,
r2,
r3,
r2,
r8,
r9,
r3,
r5,
r2,
r1,
r8,
r4,
r2,
0
-21
r7,
r5,
r7,
r4
r2,
r3,
r3
r2,
r1,
r1,
r4
r4
r9
r9
The problem is
that the 2nd instruction needs a value
computed by the 1st instruction!
Unless one is careful
the result will be wrong !
r1
r4
-1
-1
r7
Do you see the problem ?!
Add r3, r2, r1
Arithm.
unit
Add r5, r3, r4
Arithm.
unit
6/9
ICC Module 3 Lesson 1 – Computer Architecture
© 2015 Ph. Janson
Doubling the throughput of our processor
103:
104:
105:
106:
107:
108:
109:
110:
111:
112:
113:
114:
115:
load
load
add
mult
sub
load
add
sub
load
add
add
div
load
r1,
r2,
r3,
r2,
r8,
r9,
r3,
r5,
r2,
r1,
r8,
r4,
r2,
0
-21
r7,
r5,
r7,
r4
r2,
r3,
r3
r2,
r1,
r1,
r4
r4
r9
r9
In practice one executes
between one and two instructions
at a time and then the result is correct
r1
r4
-1
-1
r7
Add
Add
Sub
Add
r8,
r1,
r5,
r3,
r1,
r2,
r3,
r2,
Arithm.
unit
-1
-1
r4
r1
Div
r4, r1, r7
NOTHING
Load
r2, r3
NOTHING
Arithm.
unit
7/9
ICC Module 3 Lesson 1 – Computer Architecture
© 2015 Ph. Janson
A “superscalar” processor
Register bank
Dependency detection
Arithm.
unit
Arithm.
unit
Arithm.
unit
Arithm.
unit
►All modern processors for portable computers as well as servers include this
►in addition they reorder and execute instructions before knowing whether they need to be
(for instance after an instruction such as jump_lte)
8/9
ICC Module 3 Lesson 1 – Computer Architecture
© 2015 Ph. Janson
Performance engineering (2)
►One can modify the structure of a system to execute programs faster
►One can add resources to processors to make then faster
►Or one can use simpler processors to spare energy
This is an example of computer architecture,
which is another branch of Computer Engineering
9/9
Download