Quiz-1 Topics

advertisement
ECE-3056-B
Quiz-1 Topic Areas
John Copeland
Feb. 14, 2014
01,02
Introduction
Moore's Law – Transistor Counts x2 every 18 months, x10 every 5 years.
Feature size x 0.1 every 12 to 15 years. (1970) 10 um  (2010) 0.02 um
Dennard's Law – Power Density stays constant. Not valid after 2000 (why).
 Power Wall: increasing efficiency and parallel computing is important.
Memory Wall: DRAM Performance / CPU performance:
1.0 in 1975 now ratio is 300 smaller.  L1, L2, L3 cache, Virtual Memory
IC Cost: Put as much as possible on a chip. Sockets slow and expensive.
Limited by thermal dissipation (heat) and defect density (improves
faster).
Multicore processors – more than one "CPU" core on a chip (IC).
Programming, synchronization, memory systems tough, must be done.
2
02 Study Guide
• Moore’s Law
• Technology Trends
– Explain the shift to power and energy efficient
computing
• Understanding Cost
– What are the major elements of cost?
• Multicore processor
– Distinguishing features
• Basic Components of a Modern Processor
3
02 Glossary
•
•
•
•
•
•
•
Energy efficiency
Dennard Scaling
Die yield
Feature size
Heterogeneity
Moore’s Law
Multicore
architecture
• Memory Wall
• Performance scaling
• Parallel
programming
• Power efficiency
• Power Wall
• Tick-tock
development model
• Wafer
4
03 MIPS Instruction Set Architecture
Choosing the Instructions (Set) defines what a CPU can do without coding.
A RISC (Reduced Instr. Set. CPU) easier to design, runs faster, smaller area.
A CISC (Complex ISC) has microcode to do complex action per instruction.
High Level Code depends on the Compiler to generate "machine code".
MIPS has only three basic instruction types, and formats (R, I, J).
Makes Control Signal Decode (from Instruction bits) much easier.
All code uses branches and jumps. To add logic, and allow subroutines, DLLs.
MIPS is a "Von Neuman" architecture: instructions and data in memory.
5
Below Your Program
• Application software
– Written in high-level language
• System software
– Compiler: translates HLL code to
machine code
– Operating System: service code
• Handling input/output
• Managing memory and storage
• Scheduling tasks & sharing resources
• Hardware
– Processor, memory, I/O
controllers
6
Design Principles
• Design Principle 1: Simplicity favors regularity.
• Design Principle 2: Smaller is faster
– e.g. main memory: billions of locations
• Design Principle 3: Good design demands a
compromise
• Design Principle 4: Make the common case fast
– Small constants are common
– Immediate operand avoids a load instruction
7
03a Glossary
•
•
•
•
•
•
•
•
•
Basic block
Big endian
Binary compatibility
Byte aligned memory access
Data directives
Destination operand
Frame pointer
General purpose registers
Global pointer
•
•
•
•
•
•
•
•
•
I-format R-format
J-format
Immediate operand
Instruction encoding
Instruction format
Instruction set architecture
Little Endian
Machine code (or language)
Memory map
8
Glossary (cont.)
•
•
•
•
•
•
•
Native instructions
Orthogonal ISA
PC-relative addressing
Pseudo instructions
R-format
Sign extension
Source operand
• Stack pointer
• System software vs.
application software
• Unsigned vs. signed
instructions
• Word aligned memory
access
Von Neumann execution model has subparts: Processor with ALU, Registers; Control
with PC; Memory.
The meaning of the term has evolved to mean a stored-program computer in which an
instruction fetch and a data operation cannot occur at the same time because they
share a common bus. This is referred to as the Von Neumann bottleneck and often
limits the performance of the system.
- Links are to Wikipedia
9
03b Procedure Calls – Assembly
Basic functionality
Transfer of parameters & control to procedure
Transfer of results & control back to the calling program
Support for nested procedures.
Procedure code read from "Text" memory section – not writable.
10
$a0
$a1
$a2
$a3
argument
registers
procedure
$v0
$v1
result
registers
Procedure code can also read/write Static Memory
Local Variables are saved on Stack (disappear on return)
$fp
Old Stack Frame
$sp
$fp
arg registers
return address
New Stack
Frame
Saved registers
local variables
$sp
Low Address
11
Object File
Header (information)
Text segment
C program (code)
or Assembler
(QtSPIM)
compiler
Assembly (code)
Data segment
Relocation information
Symbol table
assembler
Assembly
Program
Native
Instructions
Assembled
Binary
Object module
Object libarary
linker
executable
loader
Loader allocates pages of
Physical Memory and maps
them to Virtual pages (that
code refers to)
memory
12
Dynamic Linking of Libraries
• Only link/load library procedure when it is called
– Requires procedure code to be relocatable
– Avoids image bloat caused by static linking of all
(transitively) referenced libraries
– Automatically picks up new library versions
• Where do DLL's come from?
• Why is Dynamic Linking important?
– DLLs are stored on disk, in files (not in Physical Memory).
Dynamic Loader must allocate Physical Memory when
they are needed, and free the memory when they are no
longer needed (even though the process continues to
run)
13
03b Glossary
•
•
•
•
•
•
•
Argument registers
Caller save registers
Callee save registers
Disassembly
Frame pointer
Independent compilation
Labels: local, global,
external
• Linker/loader
• Linking: static vs. dynamic
vs. lazy
•
•
•
•
•
•
•
•
•
•
Native instructions
Nested procedures
Object file
One/two pass assembly
Procedure invocation
Pseudo instructions
Relocatable code
Stack frame
Stack pointer
Symbol table
14
04a Arithmetic ( and Logic, ALU)
Operations on integers:
Bit-wise logic operations: AND, OR, XOR (+), Shift (left, right, arithmetic or logical)
Addition and subtraction
A-B done as A + (-B)
Negate B by flipping all bits B(+)1111111… and adding 1 (lsb carry-in = 1)
Multiplication and division (more complicated, need twice as large Accumulator)
Dealing with overflow (Exception Handler: crash or  largest number)
Operation on floating-point real numbers
Representation and operations
Instruction Streams
Today serial computing cores
(von Neumann model)
Data Streams
SISD
SIMD
MISD
MIMD
Single instruction multiple data
stream computing,
e.g., Intel AVX or SSE
Today’s Multicore
15
Vector Computation
• Operate on multiple data elements (vectors)
at a time
• Flexible definition/use of registers
•
Registers hold integers, floats (SP), doubles DP)
128-bit Register
1x128 bit integer
2x64-bit double precision
4 x 32-bit single precision
8x16 short integers
16
04a Glossary
• Co-processor
• Data parallelism
• Data parallel
computation vs. vector
computation
• Instruction set
extensions
• Overflow
• MIMD
•
•
•
•
Precision
SIMD
Saturating arithmetic
Signed arithmetic
support
• Unsigned arithmetic
support
• Vector processing
17
04b Energy, Power Dissipation (Heat)
Vdd
PMOS
• Dynamic Power Consumption
Vin
– Caused by switching transitions  cost of switching state
Vout
NMOS
Ground
• Static Power Consumption
– Caused by leakage currents in the absence of any switching activity
• Power consumption per transistor changes with each
technology generation
– No longer reducing at the same rate
– What happens to power density?
Energy -> heat each half-cycle
= 0.5 V2 C
VDD
VDD
Voltage
iDD
VDD
CL
0
T
Input to
CMOS
inverter
iDD
CL
Time
Output
Capacitor
Charging
Output
Capacitor
Discharging
18
04b
•
•
•
•
•
Dynamic Energy
Dynamic Power
Load capacitance
Static Energy
Static Power
Glossary
• Time constant
• Threshold voltage
• Switching energy
19
Download