ECE-3056-B Quiz-1 Topic Areas John Copeland Feb. 14, 2014 01,02 Introduction Moore's Law – Transistor Counts x2 every 18 months, x10 every 5 years. Feature size x 0.1 every 12 to 15 years. (1970) 10 um (2010) 0.02 um Dennard's Law – Power Density stays constant. Not valid after 2000 (why). Power Wall: increasing efficiency and parallel computing is important. Memory Wall: DRAM Performance / CPU performance: 1.0 in 1975 now ratio is 300 smaller. L1, L2, L3 cache, Virtual Memory IC Cost: Put as much as possible on a chip. Sockets slow and expensive. Limited by thermal dissipation (heat) and defect density (improves faster). Multicore processors – more than one "CPU" core on a chip (IC). Programming, synchronization, memory systems tough, must be done. 2 02 Study Guide • Moore’s Law • Technology Trends – Explain the shift to power and energy efficient computing • Understanding Cost – What are the major elements of cost? • Multicore processor – Distinguishing features • Basic Components of a Modern Processor 3 02 Glossary • • • • • • • Energy efficiency Dennard Scaling Die yield Feature size Heterogeneity Moore’s Law Multicore architecture • Memory Wall • Performance scaling • Parallel programming • Power efficiency • Power Wall • Tick-tock development model • Wafer 4 03 MIPS Instruction Set Architecture Choosing the Instructions (Set) defines what a CPU can do without coding. A RISC (Reduced Instr. Set. CPU) easier to design, runs faster, smaller area. A CISC (Complex ISC) has microcode to do complex action per instruction. High Level Code depends on the Compiler to generate "machine code". MIPS has only three basic instruction types, and formats (R, I, J). Makes Control Signal Decode (from Instruction bits) much easier. All code uses branches and jumps. To add logic, and allow subroutines, DLLs. MIPS is a "Von Neuman" architecture: instructions and data in memory. 5 Below Your Program • Application software – Written in high-level language • System software – Compiler: translates HLL code to machine code – Operating System: service code • Handling input/output • Managing memory and storage • Scheduling tasks & sharing resources • Hardware – Processor, memory, I/O controllers 6 Design Principles • Design Principle 1: Simplicity favors regularity. • Design Principle 2: Smaller is faster – e.g. main memory: billions of locations • Design Principle 3: Good design demands a compromise • Design Principle 4: Make the common case fast – Small constants are common – Immediate operand avoids a load instruction 7 03a Glossary • • • • • • • • • Basic block Big endian Binary compatibility Byte aligned memory access Data directives Destination operand Frame pointer General purpose registers Global pointer • • • • • • • • • I-format R-format J-format Immediate operand Instruction encoding Instruction format Instruction set architecture Little Endian Machine code (or language) Memory map 8 Glossary (cont.) • • • • • • • Native instructions Orthogonal ISA PC-relative addressing Pseudo instructions R-format Sign extension Source operand • Stack pointer • System software vs. application software • Unsigned vs. signed instructions • Word aligned memory access Von Neumann execution model has subparts: Processor with ALU, Registers; Control with PC; Memory. The meaning of the term has evolved to mean a stored-program computer in which an instruction fetch and a data operation cannot occur at the same time because they share a common bus. This is referred to as the Von Neumann bottleneck and often limits the performance of the system. - Links are to Wikipedia 9 03b Procedure Calls – Assembly Basic functionality Transfer of parameters & control to procedure Transfer of results & control back to the calling program Support for nested procedures. Procedure code read from "Text" memory section – not writable. 10 $a0 $a1 $a2 $a3 argument registers procedure $v0 $v1 result registers Procedure code can also read/write Static Memory Local Variables are saved on Stack (disappear on return) $fp Old Stack Frame $sp $fp arg registers return address New Stack Frame Saved registers local variables $sp Low Address 11 Object File Header (information) Text segment C program (code) or Assembler (QtSPIM) compiler Assembly (code) Data segment Relocation information Symbol table assembler Assembly Program Native Instructions Assembled Binary Object module Object libarary linker executable loader Loader allocates pages of Physical Memory and maps them to Virtual pages (that code refers to) memory 12 Dynamic Linking of Libraries • Only link/load library procedure when it is called – Requires procedure code to be relocatable – Avoids image bloat caused by static linking of all (transitively) referenced libraries – Automatically picks up new library versions • Where do DLL's come from? • Why is Dynamic Linking important? – DLLs are stored on disk, in files (not in Physical Memory). Dynamic Loader must allocate Physical Memory when they are needed, and free the memory when they are no longer needed (even though the process continues to run) 13 03b Glossary • • • • • • • Argument registers Caller save registers Callee save registers Disassembly Frame pointer Independent compilation Labels: local, global, external • Linker/loader • Linking: static vs. dynamic vs. lazy • • • • • • • • • • Native instructions Nested procedures Object file One/two pass assembly Procedure invocation Pseudo instructions Relocatable code Stack frame Stack pointer Symbol table 14 04a Arithmetic ( and Logic, ALU) Operations on integers: Bit-wise logic operations: AND, OR, XOR (+), Shift (left, right, arithmetic or logical) Addition and subtraction A-B done as A + (-B) Negate B by flipping all bits B(+)1111111… and adding 1 (lsb carry-in = 1) Multiplication and division (more complicated, need twice as large Accumulator) Dealing with overflow (Exception Handler: crash or largest number) Operation on floating-point real numbers Representation and operations Instruction Streams Today serial computing cores (von Neumann model) Data Streams SISD SIMD MISD MIMD Single instruction multiple data stream computing, e.g., Intel AVX or SSE Today’s Multicore 15 Vector Computation • Operate on multiple data elements (vectors) at a time • Flexible definition/use of registers • Registers hold integers, floats (SP), doubles DP) 128-bit Register 1x128 bit integer 2x64-bit double precision 4 x 32-bit single precision 8x16 short integers 16 04a Glossary • Co-processor • Data parallelism • Data parallel computation vs. vector computation • Instruction set extensions • Overflow • MIMD • • • • Precision SIMD Saturating arithmetic Signed arithmetic support • Unsigned arithmetic support • Vector processing 17 04b Energy, Power Dissipation (Heat) Vdd PMOS • Dynamic Power Consumption Vin – Caused by switching transitions cost of switching state Vout NMOS Ground • Static Power Consumption – Caused by leakage currents in the absence of any switching activity • Power consumption per transistor changes with each technology generation – No longer reducing at the same rate – What happens to power density? Energy -> heat each half-cycle = 0.5 V2 C VDD VDD Voltage iDD VDD CL 0 T Input to CMOS inverter iDD CL Time Output Capacitor Charging Output Capacitor Discharging 18 04b • • • • • Dynamic Energy Dynamic Power Load capacitance Static Energy Static Power Glossary • Time constant • Threshold voltage • Switching energy 19