ICC Module 3 Lesson 1 – Computer Architecture © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 7 – Architectural Parallelism School of Computer Science & Communications P. Ienne (charts), Ph. Janson (commentary) 1/9 ICC Module 3 Lesson 1 – Computer Architecture © 2015 Ph. Janson Outline ►Clip 0 – Introduction ►Clip 1 – Software technology – Assembler language Algorithms Registers Data instructions Instruction numbering Control instructions ►Clip 2 – Hardware architecture – Von Neumann’s stored program computer architecture Data storage and processing Control storage and processing ►Clip 3 – Hardware design – Instruction encoding ►Harware implementation – Transistor technology Clip 4 – Computing circuits Clip 5 – Memory circuits ►Hardware performance Clip 6 – Logic parallelism Clip 7 – Architecture parallelism First clip Previous clip Next clip 2/9 ICC Module 3 Lesson 1 – Computer Architecture © 2015 Ph. Janson How can one increase performance beyond transistor speed ? = Reduce delay = Increase throughput waiting to get a result number of results per time unit t t Two simple examples of performance increase: 1. At the circuit level Reducing the delay of an adder 2. At the processor structure level Increasing the throughput of instructions => this clip 3/9 ICC Module 3 Lesson 1 – Computer Architecture © 2015 Ph. Janson Our processor … 103: 104: 105: 106: 107: 108: 109: 110: 111: 112: 113: 114: 115: load load add mult sub load add sub load add add div load r1, r2, r3, r2, r8, r9, r3, r5, r2, r1, r8, r4, r2, 0 -21 r7, r5, r7, r4 r2, r3, r3 r2, r1, r1, r4 r4 r9 r9 … executes normally one instruction at a time r1 r4 -1 -1 r7 Can we do better ? Load Sub Mult Add Load Load r9, r4 r8, r7, r9 r2, r5, r9 r3, r7, r4 r2, -21 r1, 0 Arithm. unit 4/9 ICC Module 3 Lesson 1 – Computer Architecture © 2015 Ph. Janson Doubling the throughput of our processor 103: 104: 105: 106: 107: 108: 109: 110: 111: 112: 113: 114: 115: load load add mult sub load add sub load add add div load r1, r2, r3, r2, r8, r9, r3, r5, r2, r1, r8, r4, r2, 0 -21 r7, r5, r7, r4 r2, r3, r3 r2, r1, r1, r4 r4 r9 r9 We could imagine executing two instructions at a time! r1 r4 -1 -1 r7 Do you see the problem ?! Sub Add Load r8, r7, r9 r3, r7, r4 r1, 0 Arithm. unit Load Mult Load r9, r4 r2, r5, r9 r2, -21 Arithm. unit 5/9 ICC Module 3 Lesson 1 – Computer Architecture © 2015 Ph. Janson Doubling the throughput of our processor 103: 104: 105: 106: 107: 108: 109: 110: 111: 112: 113: 114: 115: load load add mult sub load add sub load add add div load r1, r2, r3, r2, r8, r9, r3, r5, r2, r1, r8, r4, r2, 0 -21 r7, r5, r7, r4 r2, r3, r3 r2, r1, r1, r4 r4 r9 r9 The problem is that the 2nd instruction needs a value computed by the 1st instruction! Unless one is careful the result will be wrong ! r1 r4 -1 -1 r7 Do you see the problem ?! Add r3, r2, r1 Arithm. unit Add r5, r3, r4 Arithm. unit 6/9 ICC Module 3 Lesson 1 – Computer Architecture © 2015 Ph. Janson Doubling the throughput of our processor 103: 104: 105: 106: 107: 108: 109: 110: 111: 112: 113: 114: 115: load load add mult sub load add sub load add add div load r1, r2, r3, r2, r8, r9, r3, r5, r2, r1, r8, r4, r2, 0 -21 r7, r5, r7, r4 r2, r3, r3 r2, r1, r1, r4 r4 r9 r9 In practice one executes between one and two instructions at a time and then the result is correct r1 r4 -1 -1 r7 Add Add Sub Add r8, r1, r5, r3, r1, r2, r3, r2, Arithm. unit -1 -1 r4 r1 Div r4, r1, r7 NOTHING Load r2, r3 NOTHING Arithm. unit 7/9 ICC Module 3 Lesson 1 – Computer Architecture © 2015 Ph. Janson A “superscalar” processor Register bank Dependency detection Arithm. unit Arithm. unit Arithm. unit Arithm. unit ►All modern processors for portable computers as well as servers include this ►in addition they reorder and execute instructions before knowing whether they need to be (for instance after an instruction such as jump_lte) 8/9 ICC Module 3 Lesson 1 – Computer Architecture © 2015 Ph. Janson Performance engineering (2) ►One can modify the structure of a system to execute programs faster ►One can add resources to processors to make then faster ►Or one can use simpler processors to spare energy This is an example of computer architecture, which is another branch of Computer Engineering 9/9