CS 61C: Great Ideas in Computer Architecture TLP Boolean Logic

advertisement
CS 61C:
Great Ideas in Computer Architecture
TLP, Switches, Transistors, Gates, and
Boolean Logic
Instructor:
David A. Patterson
http://inst.eecs.Berkeley.edu/~cs61c/sp12
6/27/2016
Spring 2012 -- Lecture #17
1
You are Here!
Software
• Parallel Requests
Assigned to computer
e.g., Search “Katz”
Hardware
Harness
Smart
Phone
Warehouse
Scale
Computer
• Parallel Threads Parallelism &
Assigned to core
e.g., Lookup, Ads
Achieve High
Performance
Computer
• Parallel Instructions
>1 instruction @ one time
e.g., 5 pipelined instructions
• Parallel Data
>1 data item @ one time
e.g., Add of 4 pairs of words
• Hardware descriptions
All gates @ one time
Memory
Core
(Cache)
Input/Output
Instruction Unit(s)
Core
Functional
Unit(s)
A0+B0 A1+B1 A2+B2 A3+B3
Cache Memory
Today
Logic Gates
• Programming Languages
6/27/2016
…
Core
Spring 2012 -- Lecture #17
2
Review
• Sequential software is slow software
– SIMD and MIMD only path to higher performance
• Multithreading increases utilization, Multicore
more processors (MIMD)
• Multiprocessor/Multicore uses Shared Memory
– Cache coherency implements shared memory even
with multiple copies in multiple caches
– False sharing a concern; watch block size!
• OpenMP as simple parallel extension to C
– Threads, Parallel for, private, critical sections, …
– ≈ C: small so easy to learn, but not very high level and
its easy to get into trouble
6/27/2016
Spring 2012 -- Lecture #16
3
Agenda
•
•
•
•
•
•
Wrap up TLP
Administrivia
Switching Networks, Transistors
Gates and Truth Tables for Circuits
Boolean Algebra
Summary
6/27/2016
Spring 2012 -- Lecture #17
4
Review: Strong vs Weak Scaling
• Strong scaling: problem size fixed
• Weak scaling: problem size proportional to
increase in number of processors
– Speedup on multiprocessor while keeping
problem size fixed is harder than speedup by
increasing the size of the problem
– But a natural use of a lot more performance is to
solve a lot bigger problem
Spring 2012 -- Lecture #17
6/27/2016
5
Experiment
• Try compile and run at NUM_THREADS = 64
• Try compile and run at NUM_THREADS = 64
with –O2
• Try compile and run at NUM_THREADS = 32,
16, 8, … with –O2
6/27/2016
Spring 2012 -- Lecture #16
6
32 Core: Speed-up vs. Scale-up
Speed-up:
(strong scaling)
6/27/2016
Scale-up:
Fl. Pt. Ops = 2 x Size3
Memory Capacity = f(Size2), Compute =
Spring 2012 -- Lecture #16
7
32 Core: Speed-up vs. Scale-up
Speed-up
Threads
1
2
4
8
16
32
64
6/27/2016
Scale-up:
Fl. Pt. Ops = 2 x Size3
Time Speedup Time(s Size Fl. Ops x
(secs)
ecs) (Dim)
10^9
13.75
1.00 13.75
1000
2.00
6.88
2.00 13.52
1240
3.81
3.45
3.98 13.79
1430
5.85
1.73
7.94 12.55
1600
8.19
0.88
15.56 13.61
2000 16.00
0.47
29.20 13.92
2500 31.25
0.71
19.26 13.83
2600 35.15
Memory Capacity = f(Size2), Compute =
f(Size3)
Spring 2012 -- Lecture #17
8
Strong vs. Weak Scaling
40
Scaleup
Speedup
35
Improvement
30
25
20
15
10
5
0
0
6/27/2016
8
16
24
32
40
Threads
Spring 2012 -- Lecture #17
48
56
64
9
Top 500 Supercomputer List
• Linpack: solve system of linear equations
– Most time in Z = A*X + Y, where X, Y, Z are vectors
• Matrix size first 100x100, then 1000x1000
• Size too small for supercomputers
– Let problem size increase as big as possible
• Current Fastest: Fujitsu SPARC64 (RISC)
– 548,352 processors @ 2 GHz, 8 cores/chip
– 10.5 PetaFLOPS (1015 Fl. Pt. Operations/Second)
• Matrix: 10,725,120 x 10,725,120 (!)
6/27/2016
Spring 2012 -- Lecture #17
10
Peer Instruction: Why Multicore?
The switch in ~ 2004 from 1 processor per chip to
multiple processors per chip happened because:
I. The “power wall” meant that no longer get speed via
higher clock rates and higher power per chip
II. There was no other performance option but replacing
1 ineffecient processor with multiple efficient
processors
III. OpenMP was a breakthrough in ~2000 that made
parallel programming easy
A)(orange) I only
B)(green) II only
C)(pink)
I & II only
6/27/2016
Spring 2012 -- Lecture #17
11
100s of (dead)
Parallel Programming Languages
ActorScript
Ada
Afnix
Alef
Alice
APL
Axum
Chapel
Cilk
Clean
Clojure
Concurrent C
6/27/2016
Concurrent Pascal
Concurrent ML
Concurrent Haskell
Curry
CUDA
E
Eiffel
Erlang
Fortan 90
Go
Io
Janus
JoCaml
Join
Java
Joule
Joyce
LabVIEW
Limbo
Linda
MultiLisp
Modula-3
Occam
occam-π
Spring 2012 -- Lecture #17
Orc
Oz
Pict
Reia
SALSA
Scala
SISAL
SR
Stackless Python
SuperPascal
VHDL
XC
13
False Sharing in OpenMP
{ int i; double x, pi, sum[NUM_THREADS];
#pragma omp parallel private (I,x)
{ int id = omp_get_thread_num();
for (i=id, sum[id]=0.0; i< num_steps; i=i+NUM_THREADS)
{
x = (i+0.5)*step;
sum[id] += 4.0/(1.0+x*x);
}
• What is problem?
• Sum[0] is 8 bytes in memory, Sum[1] is
adjacent 8 bytes in memory => false sharing if
block size ≥ 16 bytes
6/27/2016
Spring 2012 -- Lecture #17
14
Peer Instruction: No False Sharing
{ int i; double x, pi, sum[10000];
#pragma omp parallel private (i,x)
{ int id = omp_get_thread_num(), fix = __________;
for (i=id, sum[id]=0.0; i< num_steps; i=i+NUM_THREADS)
{
x = (i+0.5)*step;
sum[id*fix] += 4.0/(1.0+x*x);
• What
is best value to set fix to prevent false sharing?
}
A)(orange) omp_get_num_threads();
B)(green) Constant for number of blocks in cache
C)(pink)
Constant for size of cache block in bytes
6/27/2016
Spring 2012 -- Lecture #16
15
Administrivia
• Need Partners for Project 3: Who has a
partner? Who doesn’t?
• Project 3, Part 2 due Sunday March 25 before
midnight
• Homework due Sunday March 25 before
midnight
• Spring Break: Prefer March 25 or April 1?
– Friday March 23?
6/27/2016
Spring 2012 -- Lecture #17
17
Harry Porter’s 8-bit Relay Computer
• Before switches were
transistors, built using
electronic vacuum tubes
• Before vacuum tubes as
switches, built using were
electro-mechanical relays
• Porter (not Harry Potter) is
a CS Prof at Portland State
• video
• http://web.cecs.pdx.edu/~
harry/Relay/
6/27/2016
Spring 2011 -- Lecture #25
18
Agenda
•
•
•
•
•
•
Wrap up TLP
Administrivia
Switching Networks, Transistors
Gates and Truth Tables for Circuits
Boolean Algebra
Summary
6/27/2016
Spring 2012 -- Lecture #17
19
Hardware Design
• Next several weeks: we’ll study how a modern processor is
built; starting with basic elements as building blocks
• Why study hardware design?
– Understand capabilities and limitations of hw in general and
processors in particular
– What processors can do fast and what they can’t do fast
(avoid slow things if you want your code to run fast!)
– Background for more in depth hw courses (CS 152)
– Hard to know what will need for next 30 years
– There is just so much you can do with standard processors: you
may need to design own custom hw for extra performance
– Even some commercial processors today have customizable hardware!
6/27/2016
Spring 2012 -- Lecture #17
20
Synchronous Digital Systems
Hardware of a processor, such as the MIPS, is an example of
a Synchronous Digital System
Synchronous:
• All operations coordinated by a central clock
 “Heartbeat” of the system!
Digital:
• Represent All values by 2 discrete values
• Electrical signals are treated as 1’s and 0’s
•1 and 0 are complements of each other
•High /low voltage for true / false, 1 / 0
6/27/2016
Spring 2012 -- Lecture #17
21
Switches: Basic Element of Physical
Implementations
• Implementing a simple circuit (arrow shows action if
wire changes to “1” or is asserted):
A
Z
Close switch (if A is “1” or asserted)
and turn on light bulb (Z)
A
Z
Open switch (if A is “0” or unasserted)
and turn off light bulb (Z)
Z  A
6/27/2016
Spring 2012 -- Lecture #17
22
Switches (cont’d)
• Compose switches into more complex ones (Boolean
functions):
AND
B
A
Z  A and B
A
OR
Z  A or B
B
6/27/2016
Spring 2012 -- Lecture #17
23
Historical Note
• Early computer designers built ad hoc circuits
from switches
• Began to notice common patterns in their work:
ANDs, ORs, …
• Master’s thesis (by Claude Shannon) made link
between work and 19th Century
Mathematician George Boole
– Called it “Boolean” in his honor
• Could apply math to give theory to
hardware design, minimization, …
6/27/2016
Spring 2012 -- Lecture #17
24
Transistors
•
•
•
•
High voltage (Vdd) represents 1, or true
Low voltage (0 volts or Ground) represents 0, or false
Let threshold voltage (Vth) decide if a 0 or a 1
If switches control whether voltages can propagate
through a circuit, can build a computer
• Our switches: CMOS transistors
6/27/2016
Spring 2012 -- Lecture #17
25
CMOS Transistor Networks
• Modern digital systems designed in CMOS
– MOS: Metal-Oxide on Semiconductor
– C for complementary: use pairs of normally-open and
normally-closed switches
• Used to be called COS-MOS for complementary-symmetry MOS
• CMOS transistors act as voltage-controlled
switches
– Similar, though easier to work with, than relay
switches from earlier era (Porter computer)
– Use energy primarily when switching
6/27/2016
Spring 2012 -- Lecture #17
26
CMOS Transistors
Gate
Drain
Source
• Three terminals: source, gate, and drain
– Switch action:
if voltage on gate terminal is (some amount) higher/lower
than source terminal then conducting path established
between drain and source terminals (switch is closed)
Gate
Source
Gate
Drain
Source
n-channel transitor
Note circle symbol
to indicate “NOT”
or “complement”
Drain
p-channel transistor
open when voltage at Gate is low
closed when voltage at Gate is low
closes when:
opens when:
voltage(Gate) > voltage (Threshold)
voltage(Gate) > voltage (Threshold)
(High resistance when gate voltage Low, (Low resistance when gate voltage Low,
Low resistance when gate voltage High) High resistance when gate voltage High)
6/27/2016
Spring 2012 -- Lecture #17
27
CMOS circuit rules
• Don’t pass weak values => Use Complementary Pairs
–
–
–
–
N-type transistors pass weak 1’s (Vdd - Vth)
N-type transistors pass strong 0’s (ground)
Use N-type transistors only to pass 0’s (N for negative)
Converse for P-type transistors: Pass weak 0s, strong 1s
• Pass weak 0’s (Vth), strong 1’s (Vdd)
• Use P-type transistors only to pass 1’s (P for positive)
– Use pairs of N-type and P-type to get strong
values
• Never leave a wire undriven
– Make sure there’s always a path to Vdd or gnd
• Never create a path from Vdd to gnd (ground)
6/27/2016
Spring 2012 -- Lecture #17
28
MOS Networks
p-channel transistor
closed when voltage at Gate is low
opens when:
voltage(Gate) > voltage (Threshold)
what is the
relationship
between x and y?
X
3v
x
Y
0 volts
(gnd)
3 volts
(Vdd)
0v
n-channel transitor
open when voltage at Gate is low
closes when:
voltage(Gate) > voltage (Threshold)
6/27/2016
y
Called an invertor or not gate
Spring 2012 -- Lecture #17
29
MOS Networks
n-channel transitor
open when voltage at Gate is low
closes when voltage(Gate) > voltage (Source) + 
X
3v
what is the
relationship
between x and y?
x
Y
0v
p-channel transistor
0 volts
(gnd)
3 volts
(Vdd)
y
3 volts (Vdd)
0 volts (gnd)
closed when voltage at Gate is low
opens when voltage(Gate) < voltage (Source) – 
Called an invertor or not gate
6/27/2016
Spring 2012 -- Lecture #17
30
P = ½ C V2 f
• Dynamic Energy (when switching) is
proportional to Capacitance * Voltage2
• Since pulse is 0 -> 1 -> 0 or 1 -> 0 -> 1,
Energy of a single transition is proportional to
½ *Capacitance * Voltage2
• Power is just energy per transition times
frequency of transitions:proportional to
½ *Capacitance * Voltage2 * Frequency
6/27/2016
Spring 2012 -- Lecture #9
31
Two Input Networks
Student Roulette
X
what is the
relationship between x, y and z?
x
y
z
Y
3v
0 volts 0 volts
Z
0v
x
Y
y
z
0 volts 0 volts
3v
0 volts 3 volts
Z
0v
3 volts 0 volts
3 volts 3 volts
X
6/27/2016
0 volts 3 volts
3 volts 0 volts
3 volts 3 volts
Spring 2012 -- Lecture #17
32
Two Input Networks: Peer Instruction
X
Y
3v
what is the
relationship between x, y and z?
x
y
z
Called NAND gate
(NOT AND)
0 volts 0 volts 3 volts
Z
0v
X
x
Y
3v
Z
0v
6/27/2016
0 volts 3 volts
3 volts
3 volts 0 volts
3 volts
3 volts 3 volts
0 volts
y
z
0 volts 0 volts
A B C
0 0 3
volts
0 volts 3 volts
0 3 0
volts
3 volts 0 volts
0 3 0 3 volts
3 3 0 0 volts
3 volts 3 volts
Spring 2012 -- Lecture #17
33
Two Input Networks
X
Y
3v
what is the
relationship between x, y and z?
x
y
z
Called NAND gate
(NOT AND)
0 volts 0 volts 3 volts
Z
0v
X
Y
Called NOR gate
(NOT OR)
3v
Z
0v
6/27/2016
Spring 2012 -- Lecture #17
0 volts 3 volts
3 volts
3 volts 0 volts
3 volts
3 volts 3 volts
0 volts
x
y
z
0 volts 0 volts
3 volts
0 volts 3 volts
0 volts
3 volts 0 volts
0 volts
3 volts 3 volts
0 volts
34
Truth Tables
List outputs for all
possible inputs
A
B
C
D
6/27/2016
F
Y
0
Spring 2012 -- Lecture #17
35
Truth Table Example #1:
y= F(a,b): 1 iff a ≠ b
a
0
0
1
1
b
0
1
0
1
y
0
1
1
0
Y=AB + AB
Y=A + B
XOR
6/27/2016
Spring 2012 -- Lecture #17
36
Truth Table Example #2:
2-bit Adder
How
Many
Rows?
A1
A0
B1
B0
6/27/2016
C2
+
C1
C0
Spring 2012 -- Lecture #17
37
Truth Table Example #3:
32-bit Unsigned Adder
How
Many
Rows?
6/27/2016
Spring 2012 -- Lecture #17
38
Truth Table Example #4:
3-input Majority Circuit
Y=ABC + ABC + ABC + ABC
This is called Sum of Products form;
Just another way to represent the TT
as a logical expression
Y = B C + A (B C + B C)
Y = B C + A (B + C)
More simplified forms
(fewer gates and wires)
6/27/2016
Spring 2012 -- Lecture #17
39
Combinational Logic Symbols
• Common combinational logic systems have standard symbols
called logic gates
– Buffer, NOT
A
Z
– AND, NAND
A
B
Z
Easy to implement
with CMOS transistors
(the switches we have
available and use most)
– OR, NOR
A
B
6/27/2016
Z
Spring 2012 -- Lecture #17
40
Boolean Algebra
• Use plus for OR
– “logical sum”
• Use product for AND (ab or implied via ab)
– “logical product”
• “Hat” to mean complement (NOT)
• Thus
ab + a + c
= ab + a + c
= (a AND b) OR a OR (NOT c )
6/27/2016
Spring 2012 -- Lecture #17
41
Boolean Algebra: Circuit & Algebraic
Simplification
6/27/2016
Spring 2012 -- Lecture #17
42
Laws of Boolean Algebra
6/27/2016
Spring 2012 -- Lecture #17
43
Boolean Algebraic Simplification
Example
6/27/2016
Spring 2012 -- Lecture #17
44
Boolean Algebraic Simplification
Example
abcy
0000
0011
0100
0111
1001
1011
1101
1111
6/27/2016
Spring 2012 -- Lecture #17
45
Summary
• Real world voltages are analog, but are
quantized to represent logic 0 and logic 1
• Transistors are just switches, combined to
form gates: AND, OR, NOT, NAND, NOR
• Truth table can be mapped to gates for
combinational logic design
• Boolean algebra allows minimization of gates
6/27/2016
Spring 2012 -- Lecture #17
46
Download