CS 61C: Great Ideas in Computer Architecture TLP, Switches, Transistors, Gates, and Boolean Logic Instructor: David A. Patterson http://inst.eecs.Berkeley.edu/~cs61c/sp12 6/27/2016 Spring 2012 -- Lecture #17 1 You are Here! Software • Parallel Requests Assigned to computer e.g., Search “Katz” Hardware Harness Smart Phone Warehouse Scale Computer • Parallel Threads Parallelism & Assigned to core e.g., Lookup, Ads Achieve High Performance Computer • Parallel Instructions >1 instruction @ one time e.g., 5 pipelined instructions • Parallel Data >1 data item @ one time e.g., Add of 4 pairs of words • Hardware descriptions All gates @ one time Memory Core (Cache) Input/Output Instruction Unit(s) Core Functional Unit(s) A0+B0 A1+B1 A2+B2 A3+B3 Cache Memory Today Logic Gates • Programming Languages 6/27/2016 … Core Spring 2012 -- Lecture #17 2 Review • Sequential software is slow software – SIMD and MIMD only path to higher performance • Multithreading increases utilization, Multicore more processors (MIMD) • Multiprocessor/Multicore uses Shared Memory – Cache coherency implements shared memory even with multiple copies in multiple caches – False sharing a concern; watch block size! • OpenMP as simple parallel extension to C – Threads, Parallel for, private, critical sections, … – ≈ C: small so easy to learn, but not very high level and its easy to get into trouble 6/27/2016 Spring 2012 -- Lecture #16 3 Agenda • • • • • • Wrap up TLP Administrivia Switching Networks, Transistors Gates and Truth Tables for Circuits Boolean Algebra Summary 6/27/2016 Spring 2012 -- Lecture #17 4 Review: Strong vs Weak Scaling • Strong scaling: problem size fixed • Weak scaling: problem size proportional to increase in number of processors – Speedup on multiprocessor while keeping problem size fixed is harder than speedup by increasing the size of the problem – But a natural use of a lot more performance is to solve a lot bigger problem Spring 2012 -- Lecture #17 6/27/2016 5 Experiment • Try compile and run at NUM_THREADS = 64 • Try compile and run at NUM_THREADS = 64 with –O2 • Try compile and run at NUM_THREADS = 32, 16, 8, … with –O2 6/27/2016 Spring 2012 -- Lecture #16 6 32 Core: Speed-up vs. Scale-up Speed-up: (strong scaling) 6/27/2016 Scale-up: Fl. Pt. Ops = 2 x Size3 Memory Capacity = f(Size2), Compute = Spring 2012 -- Lecture #16 7 32 Core: Speed-up vs. Scale-up Speed-up Threads 1 2 4 8 16 32 64 6/27/2016 Scale-up: Fl. Pt. Ops = 2 x Size3 Time Speedup Time(s Size Fl. Ops x (secs) ecs) (Dim) 10^9 13.75 1.00 13.75 1000 2.00 6.88 2.00 13.52 1240 3.81 3.45 3.98 13.79 1430 5.85 1.73 7.94 12.55 1600 8.19 0.88 15.56 13.61 2000 16.00 0.47 29.20 13.92 2500 31.25 0.71 19.26 13.83 2600 35.15 Memory Capacity = f(Size2), Compute = f(Size3) Spring 2012 -- Lecture #17 8 Strong vs. Weak Scaling 40 Scaleup Speedup 35 Improvement 30 25 20 15 10 5 0 0 6/27/2016 8 16 24 32 40 Threads Spring 2012 -- Lecture #17 48 56 64 9 Top 500 Supercomputer List • Linpack: solve system of linear equations – Most time in Z = A*X + Y, where X, Y, Z are vectors • Matrix size first 100x100, then 1000x1000 • Size too small for supercomputers – Let problem size increase as big as possible • Current Fastest: Fujitsu SPARC64 (RISC) – 548,352 processors @ 2 GHz, 8 cores/chip – 10.5 PetaFLOPS (1015 Fl. Pt. Operations/Second) • Matrix: 10,725,120 x 10,725,120 (!) 6/27/2016 Spring 2012 -- Lecture #17 10 Peer Instruction: Why Multicore? The switch in ~ 2004 from 1 processor per chip to multiple processors per chip happened because: I. The “power wall” meant that no longer get speed via higher clock rates and higher power per chip II. There was no other performance option but replacing 1 ineffecient processor with multiple efficient processors III. OpenMP was a breakthrough in ~2000 that made parallel programming easy A)(orange) I only B)(green) II only C)(pink) I & II only 6/27/2016 Spring 2012 -- Lecture #17 11 100s of (dead) Parallel Programming Languages ActorScript Ada Afnix Alef Alice APL Axum Chapel Cilk Clean Clojure Concurrent C 6/27/2016 Concurrent Pascal Concurrent ML Concurrent Haskell Curry CUDA E Eiffel Erlang Fortan 90 Go Io Janus JoCaml Join Java Joule Joyce LabVIEW Limbo Linda MultiLisp Modula-3 Occam occam-π Spring 2012 -- Lecture #17 Orc Oz Pict Reia SALSA Scala SISAL SR Stackless Python SuperPascal VHDL XC 13 False Sharing in OpenMP { int i; double x, pi, sum[NUM_THREADS]; #pragma omp parallel private (I,x) { int id = omp_get_thread_num(); for (i=id, sum[id]=0.0; i< num_steps; i=i+NUM_THREADS) { x = (i+0.5)*step; sum[id] += 4.0/(1.0+x*x); } • What is problem? • Sum[0] is 8 bytes in memory, Sum[1] is adjacent 8 bytes in memory => false sharing if block size ≥ 16 bytes 6/27/2016 Spring 2012 -- Lecture #17 14 Peer Instruction: No False Sharing { int i; double x, pi, sum[10000]; #pragma omp parallel private (i,x) { int id = omp_get_thread_num(), fix = __________; for (i=id, sum[id]=0.0; i< num_steps; i=i+NUM_THREADS) { x = (i+0.5)*step; sum[id*fix] += 4.0/(1.0+x*x); • What is best value to set fix to prevent false sharing? } A)(orange) omp_get_num_threads(); B)(green) Constant for number of blocks in cache C)(pink) Constant for size of cache block in bytes 6/27/2016 Spring 2012 -- Lecture #16 15 Administrivia • Need Partners for Project 3: Who has a partner? Who doesn’t? • Project 3, Part 2 due Sunday March 25 before midnight • Homework due Sunday March 25 before midnight • Spring Break: Prefer March 25 or April 1? – Friday March 23? 6/27/2016 Spring 2012 -- Lecture #17 17 Harry Porter’s 8-bit Relay Computer • Before switches were transistors, built using electronic vacuum tubes • Before vacuum tubes as switches, built using were electro-mechanical relays • Porter (not Harry Potter) is a CS Prof at Portland State • video • http://web.cecs.pdx.edu/~ harry/Relay/ 6/27/2016 Spring 2011 -- Lecture #25 18 Agenda • • • • • • Wrap up TLP Administrivia Switching Networks, Transistors Gates and Truth Tables for Circuits Boolean Algebra Summary 6/27/2016 Spring 2012 -- Lecture #17 19 Hardware Design • Next several weeks: we’ll study how a modern processor is built; starting with basic elements as building blocks • Why study hardware design? – Understand capabilities and limitations of hw in general and processors in particular – What processors can do fast and what they can’t do fast (avoid slow things if you want your code to run fast!) – Background for more in depth hw courses (CS 152) – Hard to know what will need for next 30 years – There is just so much you can do with standard processors: you may need to design own custom hw for extra performance – Even some commercial processors today have customizable hardware! 6/27/2016 Spring 2012 -- Lecture #17 20 Synchronous Digital Systems Hardware of a processor, such as the MIPS, is an example of a Synchronous Digital System Synchronous: • All operations coordinated by a central clock “Heartbeat” of the system! Digital: • Represent All values by 2 discrete values • Electrical signals are treated as 1’s and 0’s •1 and 0 are complements of each other •High /low voltage for true / false, 1 / 0 6/27/2016 Spring 2012 -- Lecture #17 21 Switches: Basic Element of Physical Implementations • Implementing a simple circuit (arrow shows action if wire changes to “1” or is asserted): A Z Close switch (if A is “1” or asserted) and turn on light bulb (Z) A Z Open switch (if A is “0” or unasserted) and turn off light bulb (Z) Z A 6/27/2016 Spring 2012 -- Lecture #17 22 Switches (cont’d) • Compose switches into more complex ones (Boolean functions): AND B A Z A and B A OR Z A or B B 6/27/2016 Spring 2012 -- Lecture #17 23 Historical Note • Early computer designers built ad hoc circuits from switches • Began to notice common patterns in their work: ANDs, ORs, … • Master’s thesis (by Claude Shannon) made link between work and 19th Century Mathematician George Boole – Called it “Boolean” in his honor • Could apply math to give theory to hardware design, minimization, … 6/27/2016 Spring 2012 -- Lecture #17 24 Transistors • • • • High voltage (Vdd) represents 1, or true Low voltage (0 volts or Ground) represents 0, or false Let threshold voltage (Vth) decide if a 0 or a 1 If switches control whether voltages can propagate through a circuit, can build a computer • Our switches: CMOS transistors 6/27/2016 Spring 2012 -- Lecture #17 25 CMOS Transistor Networks • Modern digital systems designed in CMOS – MOS: Metal-Oxide on Semiconductor – C for complementary: use pairs of normally-open and normally-closed switches • Used to be called COS-MOS for complementary-symmetry MOS • CMOS transistors act as voltage-controlled switches – Similar, though easier to work with, than relay switches from earlier era (Porter computer) – Use energy primarily when switching 6/27/2016 Spring 2012 -- Lecture #17 26 CMOS Transistors Gate Drain Source • Three terminals: source, gate, and drain – Switch action: if voltage on gate terminal is (some amount) higher/lower than source terminal then conducting path established between drain and source terminals (switch is closed) Gate Source Gate Drain Source n-channel transitor Note circle symbol to indicate “NOT” or “complement” Drain p-channel transistor open when voltage at Gate is low closed when voltage at Gate is low closes when: opens when: voltage(Gate) > voltage (Threshold) voltage(Gate) > voltage (Threshold) (High resistance when gate voltage Low, (Low resistance when gate voltage Low, Low resistance when gate voltage High) High resistance when gate voltage High) 6/27/2016 Spring 2012 -- Lecture #17 27 CMOS circuit rules • Don’t pass weak values => Use Complementary Pairs – – – – N-type transistors pass weak 1’s (Vdd - Vth) N-type transistors pass strong 0’s (ground) Use N-type transistors only to pass 0’s (N for negative) Converse for P-type transistors: Pass weak 0s, strong 1s • Pass weak 0’s (Vth), strong 1’s (Vdd) • Use P-type transistors only to pass 1’s (P for positive) – Use pairs of N-type and P-type to get strong values • Never leave a wire undriven – Make sure there’s always a path to Vdd or gnd • Never create a path from Vdd to gnd (ground) 6/27/2016 Spring 2012 -- Lecture #17 28 MOS Networks p-channel transistor closed when voltage at Gate is low opens when: voltage(Gate) > voltage (Threshold) what is the relationship between x and y? X 3v x Y 0 volts (gnd) 3 volts (Vdd) 0v n-channel transitor open when voltage at Gate is low closes when: voltage(Gate) > voltage (Threshold) 6/27/2016 y Called an invertor or not gate Spring 2012 -- Lecture #17 29 MOS Networks n-channel transitor open when voltage at Gate is low closes when voltage(Gate) > voltage (Source) + X 3v what is the relationship between x and y? x Y 0v p-channel transistor 0 volts (gnd) 3 volts (Vdd) y 3 volts (Vdd) 0 volts (gnd) closed when voltage at Gate is low opens when voltage(Gate) < voltage (Source) – Called an invertor or not gate 6/27/2016 Spring 2012 -- Lecture #17 30 P = ½ C V2 f • Dynamic Energy (when switching) is proportional to Capacitance * Voltage2 • Since pulse is 0 -> 1 -> 0 or 1 -> 0 -> 1, Energy of a single transition is proportional to ½ *Capacitance * Voltage2 • Power is just energy per transition times frequency of transitions:proportional to ½ *Capacitance * Voltage2 * Frequency 6/27/2016 Spring 2012 -- Lecture #9 31 Two Input Networks Student Roulette X what is the relationship between x, y and z? x y z Y 3v 0 volts 0 volts Z 0v x Y y z 0 volts 0 volts 3v 0 volts 3 volts Z 0v 3 volts 0 volts 3 volts 3 volts X 6/27/2016 0 volts 3 volts 3 volts 0 volts 3 volts 3 volts Spring 2012 -- Lecture #17 32 Two Input Networks: Peer Instruction X Y 3v what is the relationship between x, y and z? x y z Called NAND gate (NOT AND) 0 volts 0 volts 3 volts Z 0v X x Y 3v Z 0v 6/27/2016 0 volts 3 volts 3 volts 3 volts 0 volts 3 volts 3 volts 3 volts 0 volts y z 0 volts 0 volts A B C 0 0 3 volts 0 volts 3 volts 0 3 0 volts 3 volts 0 volts 0 3 0 3 volts 3 3 0 0 volts 3 volts 3 volts Spring 2012 -- Lecture #17 33 Two Input Networks X Y 3v what is the relationship between x, y and z? x y z Called NAND gate (NOT AND) 0 volts 0 volts 3 volts Z 0v X Y Called NOR gate (NOT OR) 3v Z 0v 6/27/2016 Spring 2012 -- Lecture #17 0 volts 3 volts 3 volts 3 volts 0 volts 3 volts 3 volts 3 volts 0 volts x y z 0 volts 0 volts 3 volts 0 volts 3 volts 0 volts 3 volts 0 volts 0 volts 3 volts 3 volts 0 volts 34 Truth Tables List outputs for all possible inputs A B C D 6/27/2016 F Y 0 Spring 2012 -- Lecture #17 35 Truth Table Example #1: y= F(a,b): 1 iff a ≠ b a 0 0 1 1 b 0 1 0 1 y 0 1 1 0 Y=AB + AB Y=A + B XOR 6/27/2016 Spring 2012 -- Lecture #17 36 Truth Table Example #2: 2-bit Adder How Many Rows? A1 A0 B1 B0 6/27/2016 C2 + C1 C0 Spring 2012 -- Lecture #17 37 Truth Table Example #3: 32-bit Unsigned Adder How Many Rows? 6/27/2016 Spring 2012 -- Lecture #17 38 Truth Table Example #4: 3-input Majority Circuit Y=ABC + ABC + ABC + ABC This is called Sum of Products form; Just another way to represent the TT as a logical expression Y = B C + A (B C + B C) Y = B C + A (B + C) More simplified forms (fewer gates and wires) 6/27/2016 Spring 2012 -- Lecture #17 39 Combinational Logic Symbols • Common combinational logic systems have standard symbols called logic gates – Buffer, NOT A Z – AND, NAND A B Z Easy to implement with CMOS transistors (the switches we have available and use most) – OR, NOR A B 6/27/2016 Z Spring 2012 -- Lecture #17 40 Boolean Algebra • Use plus for OR – “logical sum” • Use product for AND (ab or implied via ab) – “logical product” • “Hat” to mean complement (NOT) • Thus ab + a + c = ab + a + c = (a AND b) OR a OR (NOT c ) 6/27/2016 Spring 2012 -- Lecture #17 41 Boolean Algebra: Circuit & Algebraic Simplification 6/27/2016 Spring 2012 -- Lecture #17 42 Laws of Boolean Algebra 6/27/2016 Spring 2012 -- Lecture #17 43 Boolean Algebraic Simplification Example 6/27/2016 Spring 2012 -- Lecture #17 44 Boolean Algebraic Simplification Example abcy 0000 0011 0100 0111 1001 1011 1101 1111 6/27/2016 Spring 2012 -- Lecture #17 45 Summary • Real world voltages are analog, but are quantized to represent logic 0 and logic 1 • Transistors are just switches, combined to form gates: AND, OR, NOT, NAND, NOR • Truth table can be mapped to gates for combinational logic design • Boolean algebra allows minimization of gates 6/27/2016 Spring 2012 -- Lecture #17 46