CS 152 Computer Architecture and Engineering Lecture 26 -- Midterm II Review Session 2014-4-29 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play: CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB CS152 Midterm II May 1st, 2014 # Points Name: 1 25 SSID: 2 25 3 25 4 25 “All the work is my own. I have no prior knowledge of the exam contents, aside from guidance from class staff. I will not share the contents with others in CS152 who have not taken it yet.” Signature: Please write clearly, and put your name on each page. Please abide by word limits. Good luck! Eric Love John Lazzaro Tot 100 What does it cover? Lectures 9 onward Focus will be on problems that require you to do a task (write a small program, trace through execution ,etc) that demonstrates that you understand a concept. [...] No transistor-level questions (DRAM and SRAM cells, etc) Time for a quick walk-through ... CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB CS 152 Computer Architecture and Engineering Lecture 9 -- Memory 2014-2-18 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces 1 13-bit row address input of 81 92 de co de r What if we want all of the 16384 bits? In row access time (55 ns) we can do 22 transfers at 400 MT/s. 16-bit chip bus -> 22 x 16 = 352 bits << Now the row access 16384time looks fast! 16384 columns 8192 rows 134 217 728 usable bits (tester found good bits in bigger array) 16384 bits delivered by sense amps Select requested bits, send off the CS 152 L9: Memory UC Regents Spring 2014 © UCB CS 152 Computer Architecture and Engineering Lecture 10 -- Cache I 2014-2-20 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Latency: A closer look Read latency: Time to return first byte of a random access Reg L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency (cycles) 1 3 3 11 160 1E+07 Latency (sec) 0.6n 1.9n 1.9n 6.9n 100n 12.5m 1.6G 533M 533M 145M 10M 80 Hz Architect’s latency toolkit: (1) Parallelism. Request data from N 1-bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. Requests to N memory banks (interleaving) have potential of N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later. CS 194-6 L8: Cache UC Regents Fall 2008 © UCB CS 152 Computer Architecture and Engineering Lecture 11 -- Cache II 2014-2-25 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Issue #4: When to write to lower level ... Write-Through Write-Back Policy Data written to cache block also written to lower-level memory Write data only to the cache Update lower level when a block falls out of the cache Do read misses produce writes? No Yes Do repeated writes make it to lower level? Yes No CS 152 L11: Cache II Related issue: Do writes to blocks not in the cache get put in the cache (”writeallocate”) or not? UC Regents Spring 2014 © UCB CS 152 Computer Architecture and Engineering Lecture 12 -- Virtual Memory 2014-2-27 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB The TLB caches page table entries In this example, physical and virtual pages must be the same size! TLB caches page table entries. virtual address page for ASID off Physical frame address Page Table 2 0 1 3 TLB frame page 2 2 0 5 CS 152 L15: Virtual Memory physical address frame page off MIPS handles TLB misses in software (random replacement). Other machines use hardware. V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 “Page fault” UC Regents Fall 2006 © UCB CS 152 Computer Architecture and Engineering Lecture 13 - Synchronization 2014-3-4 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Non-blocking consumer synchronization Another atomic read-modify-write instruction: Compare&Swap(Rt,Rs, m) if (Rt == M[m]) then M[m] = Rs; Rs = Rt; /* do swap */ else /* do not swap */ Assuming sequential consistency: MEMBARs not shown ... try: LW R3, head(R0) ; Load queue head into R3 spin: LW R4, tail(R0) BEQ R4, R3, spin LW R5, 0(R3) ADDI R6, R3, 4 Compare&Swap R3, BNE R3, R6, try ; Load queue tail into R4 ; If queue empty, wait ; Read x from queue into R5 ; Shift head by one word R6, head(R0); Try to update head ; If not success, try again If R3 != R6, another thread got here first, so we must try again. If thread swaps out before Compare&Swap, no latency problem; CS 152 L24: Multiprocessors UC Regents Fall 2006 © UCB CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence 2014-3-6 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Writes from 10,000 feet ... for write-thru L1 1. Writing CPU takes control of bus. For write-thru caches ... CPU1 CPU0 Cache Snooper Cache Snooper Memory bus Shared Main Memory Hierarchy To a first-order, reads will “just work” if write-thru caches implement this policy. A “two-state” protocol (cache lines are “valid” or “invalid”). CS 152 L14: Cache Design and Coherency 2. Address to be written is invalidated in all other caches. Reads will no longer hit in cache and get stale data. 3. Write is sent to main memory. Reads will cache miss, retrieve new value from main UC Regents Spring 2014 © UCB CS 152 Computer Architecture and Engineering Lecture 15 -- Advanced CPUs 2014-3-11 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L15: Superscalars and Scoreboards UC Regents Spring 2014 © UCB Split pipelines: a write-after-write hazard. Solution: SUB detects R1 clash in decode stage and stalls, via a pipe-write scoreboard. WAW Hazard DIV R1, R2, R3 SUB R1, R2, R3 If long latency DIV and short latency SUB are sent to parallel pipes, SUB may finish first. CS 194-6 L9: Advanced Processors I The pipeline splits after the RF stage, feeding functional units with different latencies. UC Regents Fall 2008 © UCB IF (Fetch) Superscalar R machine ID (Decode) IR IR RegFile rd1 rs2 ws1 64 WB IR IR Y R rd2 Y R IR IR B wd1 Data Instr Mem rs3 Addr ws2 rd3 A rs4 rd4 B wd2 32 PC and Sequencer MEM A rs1 Instruction Issue Logic EX (ALU) WE1 WE2 IR IF (Fetch) CS 194-6 L9: Advanced Processors I IR ID (Decode) EX (ALU) MEM WB UC Regents Fall 2008 © UCB CS 152 Computer Architecture and Engineering Lecture 17 -- Networks, Routers, Google 2014-3-20 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB 6 key parameters scale across dimension of “by one server”, “by 80-server rack” and “by array” To get more DRAM and disk capacity, you must work on a scale larger than a single server. But as you do, latency and bandwidth degrade, because network performance << a server bus, and because array network is under-provisioned. Exception: disk latency is roughly scale-independent. CS 152 Computer Architecture and Engineering Lecture 18 -- Dynamic Scheduling I 2014-4-1 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Given an endless supply of registers ... Rename “architected registers” (Ri, Fi) to new “physical registers” (PRi, PFi) on each write. ADDI R1,R0,64 ADDI PR01,PR00,64 R1→ PR01 F0→ PF00 F4,0(R1) LD PF00 0(PR01) ADDD PF04, PF00, PF02 SD PF04, 0(PR01) SUBI PR11, PR01, 8 BEQZ PR11 ENDLOOP ITER2: LD PF10 0(PR11) What was gained? An instruction may execute once all of its source registers have been written. CS 152 L18: Dynamic Scheduling I ADDD PF14, PF10, PF02 SD PF14, 0(PR11) SUBI PR21, PR11, 8 BEQZ PR21 ENDLOOP ITER3: LD PF20 O(PR21) [...] UC Regents Spring 2014 © UCB CS 152 Computer Architecture and Engineering Lecture 19 -- Dynamic Scheduling II 2014-4-3 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Rename stage close-up: (1) Allocates new physical registers for destinations, (2) Looks up physical register numbers for sources, (3) Handle rename dependences within the 4 issuing instructions in one clock cycle! For mis-speculation recovery Timestamped. Input: 4 instructions specifying architected registers. Output: 12 physical registers numbers: 1 destination and 2 sources for the 4 instructions to be issued. CS 152 Computer Architecture and Engineering Lecture 20 -- Dynamic Scheduling III 2014-4-8 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Micro-op translation example ... ADC m32, r32: // for a simple m32 address mode Becomes: LD T1 0(EBX); // EBX register point to m32 ADD T1, T1, CF; // CF is carry flag from EFLAGS ADD T1, T1, r32; // Add the specified register ST 0(EBX) T1; // Store result back to m32 Instruction traces of IA-32 programs show most executed instructions require 4 or fewer micro-ops. Translation for these ops are cast into logic gates, often over several pipeline cycles. CS 152 L20: Dynamic Scheduling III UC Regents Fall 2006 © UCB CS 152 Computer Architecture and Engineering Lecture 21 -- Dataflow 2014-4-10 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Dataflow stages of 21264 Idea: Write dataflow programs that reference physical registers, to execute on this machine. Input: Instructions that reference physical registers. Scoreboard: Tracks writes to physical registers. CS 152 Computer Architecture and Engineering Lecture 22 -- GPU + SIMD + Vectors I 2014-4-15 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Pure data move opcode. Or, part of a math opcode. CS 152 Computer Architecture and Engineering Lecture 23 -- GPU + SIMD + Vectors II 2014-4-17 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Assume MacBook Air ... 1386 x 768 screen ... We are all zoomed in on Google Maps Lets us cache a 1024 x 1024 window of the 11 PB Earth map in 34.7 MB! Top pyramid image is 4K x 4K ... Idea: Keep only a 1386 x 768 window of top images in RAM ... Zoom all the way in ... units of pixels Bottom stack image shows the smallest part of the 1 mile sq. patch of the Earth of any stack image. units of sq. miles Graphics hardware displays bottom stack image, which fills MacBook Air display. units of miles Hardware interpolation of stack levels. CS 152 Computer Architecture and Engineering Lecture 24 -- Voxel Processing 2014-4-22 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB After processing ... A 3-D matrix of cubes, in object space (X,Y,Z). 8-bit density value stored for each cube (0 = “air”). 256^3 = 16 MB = 10 inch cube (for 1mm voxels) 0.125 mm voxels? 8 GB Interesting to computer architects because n^3 grows so quickly! CS 152 Computer Architecture and Engineering Lecture 25 -- Digital Imaging 2014-4-24 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Camera interface to the outside world Simple Power Hookup Serial port to control the camera. 8-bit Dout Port 54 MHz Clk 1280 x 1024 @ 15 fps 640 x 512 @ 30 fps YCrCb 4:2:2 CS 250 L12: CMOS Imagers UC Regents Fall 2012 © UCB AWARE-2: Array of 98 phone camera modules (14 M-pixel) 1.3 G-pixel camera @3 frames/sec On Thursday Mid-term II ... Ground rules ... Mid-term: How to do well ... Problem intro often features a lecture slide. If you have to teach yourself that slide during the test, you’re starting out behind. Getting the problem correct requires thinking on your feet to do a new design or analyze one given to you. There will not be “you can only get it if do the reading” problems ... but the reading helps you understand how to think through the problem. CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Mid-term: There may be math ... No memorization: If we ask about Amdahl’s Law, we will show its definition lecture slide. Understanding is needed: A problem may require you to apply equation to a design, etc. Cannot use You may need to do: electronic devices simple algebra and calculus, ... more add a few numbers by hand, administrative etc. info after we do some content. CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB When is it? Where is it? Ground rules. 9:30 AM sharp, Tuesday May 1st, 306 Soda. Every-other-seat seating, except for the front rows, where every-seat is permitted. No blue-books needed. We will be handing out a paper test. Pencil is preferred. Pencils down @ 10:55 AM, so we can collect papers before next class comes in. CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB When is it? Where is it? Ground rules. No use of calculators, smartphones, laptops, etc ... during the exam. Closed-book, closed-notes. Just pencils, erasers. No consulting with students. Restroom breaks are OK, but you’ll still need to hand in your exam @ 10:55. Questions are reserved for serious concerns about a bug in the question. CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L16: Midterm I Review UC Regents Spring 2014 © UCB On Thursday Mid-term II ... See you there !