Memory Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory +4 inst register file +4 =? PC control offset new pc alu cmp target imm extend A Single cycle processor addr din dout memory Goals for today Review • Finite State Machines Memory • • • • Register Files Tri-state devices SRAM (Static RAM—random access memory) DRAM (Dynamic RAM) Which statement(s) is true (A) In a Moore Machine output depends on both current state and input (B) In a Mealy Machine output depends on current state and input (C) In a Mealy Machine output depends on next state and input (D) All the above are true (E) None are true Mealy Machine Registers General Case: Mealy Machine Current State Input Comb. Logic Output Next State Outputs and next state depend on both current state and input Moore Machine Registers Special Case: Moore Machine Current State Comb. Logic Output Input Comb. Logic Next State Outputs depend only on current state Example #2: Digital Door Lock Digital Door Lock Inputs: • keycodes from keypad • clock Outputs: • “unlock” signal • display how many keys pressed so far Door Lock: Inputs Assumptions: • signals are synchronized to clock • Password is B-A-B K A B K A B Meaning 0 0 0 Ø (no key) 1 1 0 ‘A’ pressed 1 0 1 ‘B’ pressed Door Lock: Outputs Assumptions: • High pulse on U unlocks door D3D2D1D0 4 LED 8 dec U Strategy: (1) Draw a state diagram (e.g. Moore Machine) (2) Write output and next-state tables (3) Encode states, inputs, and outputs as bits (4) Determine logic equations for next state and outputs Door Lock: Simplified State Diagram Ø Ø G1 ”1” “A” “B” G2 ”2” else “B” else G3 ”3”, U any Idle ”0” Ø else any B1 ”1” else B2 ”2” Ø else B3 ”3” Ø (1) Draw a state diagram (e.g. Moore Machine) Door Lock: Simplified State Diagram Ø Ø G1 ”1” “A” else “B” G2 ”2” else “B” G3 ”3”, U any Idle ”0” Ø else else B1 ”1” else B2 ”2” Ø Ø (1) Draw a state diagram (e.g. Moore Machine) Door Lock: Simplified State Diagram Ø Ø G1 ”1” “A” else “B” G2 ”2” else “B” Cur. State Idle ”0” Ø else else B1 ”1” else B2 ”2” Ø Ø (2) Write output and next-state tables G3 ”3”, U any Output Door Lock: Simplified State Diagram Ø Ø G1 ”1” “A” else “B” G2 ”2” else Idle ”0” Ø else else B1 ”1” else B2 ”2” Ø Ø “B” Cur. State Idle G1 G2 G3 B1 B2 (2) Write output and next-state tables G3 ”3”, U any Output “0” “1” “2” “3”, U “1” “2” Door Lock: Simplified State Diagram Ø Cur. State Ø G1 ”1” “A” else “B” G2 ”2” “B” else Idle ”0” Ø else else B1 ”1” else B2 ”2” Ø Ø (2) Write output and next-state tables Input Next State G3 ”3”, U any Door Lock: Simplified State Diagram Ø G1 ”1” “A” else “B” Idle ”0” Ø else else B1 ”1” else Cur. State Ø Idle G2 Idle “B” ”2”Idle elseG1 G1 G1 G2 G2 G2 G3 B2 B1 ”2” B1 B2 Ø B2 Ø (2) Write output and next-state tables Input Next State Ø Idle “B”G3 G1 ”3”, U “A” B1 Ø any G1 “A” G2 “B” B2 Ø B2 “B” G3 “A” Idle any Idle Ø B1 K B2 Ø B2 K Idle State Table Encoding SCur. SState S0 2 1 0 Idle 0 0 0 G1 0 1 0 G2 1 0 0 G3 1 1 1 B1 0 0 1 B2 0 1 D3 0 0 0 0 0 0 DOutput 2 D1 D0 0 “0” 0 0 0 “1” 0 1 0 “2” 1 0 0“3”, 1 U1 0 “1” 0 1 0 “2” 1 0 U 0 0 0 1 0 0 Cur. S2 SState 1 S0 0 Idle 0 0 0 Idle 0 0 0 Idle 0 0 0 G1 0 1 0 G1 0 1 0 G1 0 1 0 G2 1 0 0 G2 1 0 0 G2 1 0 0 G3 1 1 1 B1 0 0 1 B1 0 0 1 B2 0 1 1 B2 0 1 K Input A B 0 Ø 0 0 1 “B” 0 1 1 “A” 1 0 0 Ø 0 0 1 “A” 1 0 1 “B” 0 1 0 Ø 0 0 1 “B” 0 1 1 “A” 1 0 x any x x 0 Ø 0 0 1 K x x 0 Ø 0 0 1 K x x State S2 S 1 8 S0 4 Meaning K A B D3D2D1D0 dec 0 0 Idle 0 0 0 0 Ø (no key) U 0 0 1 G1 1 1 0 ‘A’ pressed K 0 G2 0 1 1 0 1 ‘B’ pressed A G3 0 1 1 B B1 1 0 0 (3) Encode states, and outputs as bits B2 1 0 inputs, 1 S’ Next 2 S’State 1 S’0 0 Idle 0 0 0 G1 0 1 1 B1 0 0 0 G1 0 1 0 G2 1 0 1 B2 0 1 0 B2 1 0 0 G3 1 1 0 Idle 0 0 0 Idle 0 0 1 B1 0 0 1 B2 0 1 1 B2 0 1 0 Idle 0 0 3bit Reg S2-0 D3-0 4 dec Door Lock: Implementation U clk S2-0 K A S’2-0 B U = S2S1S0 D0 = S2S1S0 + S2S1S0 + S2 S1S0 D1 = S2S1S0 + S2S1S0 + S2S1S0 S2 0 0 0 0 1 1 S1 0 0 1 1 0 0 S0 0 1 0 1 0 1 D3 0 0 0 0 0 0 D2 0 0 0 0 0 0 D1 0 0 1 1 0 1 D0 0 1 0 1 1 0 (4) Determine logic equations for next state and outputs U 0 0 0 1 0 0 Door Lock: Implementation 3bit Reg clk S2-0 K A B K 0 1 1 0 1 1 0 1 1 x 0 1 0 1 dec S2-0 S2 S1 S0 4 D03-00 0 0 0 0 U 0 0 0 0 0 1 0 0 1 0 0 1 0S’2-0 1 0 0 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 0 1 1 0 1 A 0 0 1 0 1 0 0 0 1 x 0 x 0 x B 0 1 0 0 0 1 0 1 0 x 0 x 0 x S’2 S’1 S’0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 0 S0’ = ? S1’ = ? S2’ = S2S1S0KAB + S2S1S0KAB + S2S1S2KAB + S2S1S0K + S2 S1S0 KAB 3bit Reg S2-0 D3-0 4 dec Door Lock: Implementation U clk S2-0 K A S’2-0 B Strategy: (1) Draw a state diagram (e.g. Moore Machine) (2) Write output and next-state tables (3) Encode states, inputs, and outputs as bits (4) Determine logic equations for next state and outputs Administrivia Make sure to go to your Lab Section this week • • • • • Find project partners this week (for upcoming project1 next week) Lab2 due in class this week (it is not homework) Design Doc for Lab1 due yesterday, Monday, Feb 4th Completed Lab1 due next week, Monday, Feb 11th Work alone Homework1 is due Wednesday • Work alone • Academic Integrity BUT, use your resources • Lab Section, Piazza.com, Office Hours, Homework Help Session, • Class notes, book, Sections, CSUGLab Second C Primer: Thursday, B14 Hollister, 6-8pm Administrivia Check online syllabus/schedule • http://www.cs.cornell.edu/Courses/CS3410/2013sp/schedule.html Slides and Reading for lectures Office Hours Homework and Programming Assignments Prelims (in evenings): • Tuesday, February 26th • Thursday, March 28th • Thursday, April 25th Schedule is subject to change Collaboration, Late, Re-grading Policies “Black Board” Collaboration Policy • Can discuss approach together on a “black board” • Leave and write up solution independently • Do not copy solutions Late Policy • Each person has a total of four “slip days” • Max of two slip days for any individual assignment • Slip days deducted first for any late assignment, cannot selectively apply slip days • For projects, slip days are deducted from all partners • 25% deducted per day late after slip days are exhausted Regrade policy • Submit written request to lead TA, and lead TA will pick a different grader • Submit another written request, lead TA will regrade directly • Submit yet another written request for professor to regrade. Lab1: Thoughts thus far Great experience! But, read all instructions (multiple times)! Sign up for design doc meetings early • We were very lenient for Lab1 • In future, CMS will lock the Friday after release • lose points for not signing up before lock, canceling last minute, and/or not showing up Design doc meeting • Not the same as office hours You are presenting your current design during design doc meeting • Bring printout of your design document • TA will grade and give feedback based on your presentation of your design Goals for today Memory • • • • CPU: Register Files (i.e. Memory w/in the CPU) Scaling Memory: Tri-state devices Cache: SRAM (Static RAM—random access memory) Memory: DRAM (Dynamic RAM) Goal: How do we store results from ALU computations? How do we use stored results in subsequent operations? Register File How does a Register File work? How do we design it? Big Picture: Building a Processor memory +4 inst register file +4 =? PC control offset new pc alu cmp target imm extend A Single cycle processor addr din dout memory Register File Register File • N read/write registers • Indexed by register number 32 DW Dual-Read-Port QA Single-Write-Port Q B 32 x 32 Register File W 1 RW RA RB 5 5 5 32 32 Register File D0 D1 D2 Recall: Register •D flip-flops in parallel •shared clock •extra clocked inputs: write_enable, reset, … D3 4 clk 4-bit reg 4 clk Register File Register File 32 D • N read/write registers • Indexed by register number Reg 0 Reg 1 5-to-32 decoder 5 …. Reg 30 Reg 31 RW W How to write to one register in the register file? • Need a decoder Activity# write truth table for 3-to-8 decoder Register File 32 D • N read/write registers • Indexed by register number Reg 0 Reg 1 5-to-32 decoder 5 …. Reg 30 Reg 31 RW W How to write to one register in the register file? • Need a decoder Register File Register File • N read/write registers • Indexed by register number Reg 0 Reg 1 …. 32 …. Reg 30 Reg 31 …. M U X M U X How to read from two registers? • Need a multiplexor 5 RA 5 RB 32 QA 32 QB Register File Register File 32 D • N read/write registers • Indexed by register number Reg 0 Reg 1 5-to-32 decoder …. …. Reg 30 Reg 31 Implementation: • D flip flops to store bits • Decoder for each write port • Mux for each read port 32 …. 5 5 W RW RA M U X M U X 5 RB 32 QA 32 QB Register File Register File • N read/write registers • Indexed by register number 32 DW Dual-Read-Port Implementation: • D flip flops to store bits • Decoder for each write port • Mux for each read port QA Single-Write-Port Q B 32 x 32 Register File W 1 RW RA RB 5 5 5 32 32 Register File Register File What happens if same • N read/write registers register read and written • Indexed by during same clock cycle? register number Implementation: • D flip flops to store bits • Decoder for each write port • Mux for each read port Tradeoffs Register File tradeoffs 8-to-1 mux a + Very fast (a few gate delays for b both read and write) c + Adding extra ports is d straightforward e – Doesn’t scale f e.g. 32MB register file with g 32 bit registers Need 32x 1M-to-1 multiplexor h and 32x 10-to-1M decoder How many logic gates/transistors? s2s1s0 Takeway Register files are very fast storage (only a few gate delays), but does not scale to large memory sizes. Goals for today Memory • • • • CPU: Register Files (i.e. Memory w/in the CPU) Scaling Memory: Tri-state devices Cache: SRAM (Static RAM—random access memory) Memory: DRAM (Dynamic RAM) Next Goal How do we scale/build larger memories? Building Large Memories Need a shared bus (or shared bit line) • Many FlipFlops/outputs/etc. connected to single wire • Only one output drives the bus at a time D0 S0 D1 S1 D2 S2 D3 S3 D1023 S1023 shared line • How do we build such a device? Tri-State Devices Tri-State Buffers • If enabled (E=1), then Q = D • Otherwise, Q is not connected (z = high impedance) E D Q E 0 0 1 1 D Q 0 z 1 z 0 0 1 1 Activity#2: Tri-State Buffer from an Inverter Tri-State Buffers • If enabled (E=1), then Q = D • Otherwise, Q is not connected (z = high impedance) E Vsupply D Q E 0 0 1 1 D Q 0 z 1 z 0 0 1 1 D Q Gnd Tri-State Devices Tri-State Buffers • If enabled (E=1), then Q = D • Otherwise, Q is not connected (z = high impedance) E D E Q D E 0 0 1 1 D Q 0 z 1 z 0 0 1 1 Vsupply D Q Gnd Tri-State Devices Tri-State Buffers • If enabled (E=1), then Q = D • Otherwise, Q is not connected (z = high impedance) E D Vsupply E Q 0 1 D E 0 0 1 1 D Q 0 z 1 z 0 0 1 1 0 off 0 A B AND NAND A B OR NOR 0 0 0 1 0 0 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 off Gnd Q z Tri-State Devices Tri-State Buffers • If enabled (E=1), then Q = D • Otherwise, Q is not connected (z = high impedance) E D E Q 1 D E 0 0 1 1 D Q 0 z 1 z 0 0 1 1 Vsupply 0 1 0 1 off 1 on 0 A B AND NAND A B OR NOR 0 0 0 1 0 0 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 Gnd Q 0 Tri-State Devices Tri-State Buffers • If enabled (E=1), then Q = D • Otherwise, Q is not connected (z = high impedance) E D E Q 1 D E 0 0 1 1 D Q 0 z 1 z 0 0 1 1 Vsupply 1 0 1 1 on 0 off 1 A B AND NAND A B OR NOR 0 0 0 1 0 0 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 1 0 1 1 1 0 Gnd Q 1 Shared Bus D0 S0 D1 S1 D2 S2 D3 S3 D1023 S1023 shared line Takeway Register files are very fast storage (only a few gate delays), but does not scale to large memory sizes. Tri-state Buffers allow scaling since multiple registers can be connected to a single output, while only one register actually drives the output. Goals for today Memory • • • • CPU: Register Files (i.e. Memory w/in the CPU) Scaling Memory: Tri-state devices Cache: SRAM (Static RAM—random access memory) Memory: DRAM (Dynamic RAM) Next Goal How do we build large memories? Use similar designs as Tri-state Buffers to connect multiple registers to output line. Only one register will drive output line. SRAM Static RAM (SRAM)—Static Random Access Memory Decoder Address • Essentially just D-Latches plus Tri-State Buffers • A decoder selects which line of memory to access Data (i.e. word line) • A R/W selector determines the type of access • That line is then coupled to the data lines SRAM Static RAM (SRAM)—Static Random Access Memory • Essentially just D-Latches plus Tri-State Buffers • A decoder selects which line of memory to access (i.e. word line) • A R/W selector determines the 22 Address type of access • That line is then coupled to SRAM 8 8 4M x 8 the data lines Din Dout Chip Select Write Enable Output Enable SRAM Din[1] E.g. How do we design a 4 x 2 SRAM Module? Din[2] D Q D Q enable enable D Q D Q enable enable 0 (i.e. 4 word lines that are each 2 bits wide)? Address 2-to-4 decoder 1 2 D Q 4 x 2 SRAM D Q enable enable D Q D Q enable enable 2 3 Write Enable Output Enable Dout[1] Dout[2] SRAM Din[1] E.g. How do we design a 4 x 2 SRAM Module? Din[2] D Q D Q enable enable D Q D Q enable enable D Q D Q enable enable D Q D Q enable enable 0 (i.e. 4 word lines that are each 2 bits wide)? Address 2-to-4 decoder 1 2 2 3 Write Enable Output Enable Dout[1] Dout[2] SRAM Bit line Din[1] Word line E.g. How do we design a 4 x 2 SRAM Module? Din[2] D Q D Q enable enable D Q D Q enable enable D Q D Q enable enable D Q D Q enable enable 0 (i.e. 4 word lines that are each 2 bits wide)? Address 2-to-4 decoder 1 2 2 3 Write Enable Output Enable Dout[1] Dout[2] Typical SRAM Cell B Pass-Through Transistors bit line SRAM Cell B word line Each cell stores one bit, and requires 4 – 8 transistors (6 is typical) Typical SRAM Cell 2) Enable(wordline (wordline==0) 1) Disabled 1 1) Pre-charge B = Vsupply/2 3) Cell pulls B high i.e. B = 1 on off B bit line SRAM Cell word line 0 on off 1) Pre-charge B = Vsupply/2 3) Cell pulls B low i.e. B = 0 B Each cell stores one bit, and requires 4 – 8 transistors (6 is typical) Read: • pre-charge B and B to Vsupply/2 • pull word line high • cell pulls B or B low, sense amp detects voltage difference Typical SRAM Cell 1) Enable(wordline (wordline==0) 1) Disabled 1→0 on off 2) Drive B low i.e. B = 0 B bit line SRAM Cell word line 0 →1 on off 2) Drive B high i.e. B = 1 B Each cell stores one bit, and requires 4 – 8 transistors (6 is typical) Read: • pre-charge B and B to Vsupply/2 • pull word line high • cell pulls B or B low, sense amp detects voltage difference Write: • pull word line high • drive B and B to flip cell SRAM Bit line Din[1] Word line E.g. How do we design a 4 x 2 SRAM Module? Din[2] D Q D Q enable enable D Q D Q enable enable D Q D Q enable enable D Q D Q enable enable 0 (i.e. 4 word lines that are each 2 bits wide)? Address 2-to-4 decoder 1 2 2 3 Write Enable Output Enable Dout[1] Dout[2] SRAM Din[1] E.g. How do we design a 4 x 2 SRAM Module? Din[2] D Q D Q enable enable D Q D Q enable enable 0 (i.e. 4 word lines that are each 2 bits wide)? Address 2-to-4 decoder 1 2 D Q 4 x 2 SRAM D Q enable enable D Q D Q enable enable 2 3 Write Enable Output Enable Dout[1] Dout[2] SRAM Din E.g. How do we design a 4M x 8 SRAM Module? 8 (i.e. 4M word lines that are each 8 bits wide)? 22 Address 4M x 8 SRAM Chip Select Write Enable Output Enable Dout 8 SRAM E.g. How do we design a 4M x 8 SRAM Module? 4M x 8 SRAM Address [21-10] Address [9-0] 12 10 12 x 4096 decoder 4k x 4k x 4k x 4k x 4k x 4k x 4k x 4k x 1024 1024 1024 1024 1024 1024 1024 1024 SRAM SRAMSRAM SRAMSRAM SRAMSRAM SRAM 1024 1024 1024 mux mux mux 1 1 1 1024 1024 1024 1024 1024 mux mux mux mux mux 1 1 1 1 1 Dout[7]Dout[6] Dout[5]Dout[4] Dout[3]Dout[2] Dout[1]Dout[0] SRAM E.g. How do we design a 4M x 8 SRAM Module? 4M x 8 SRAM Row decoder Address [21-10] Address [9-0] 12 10 Chip Select (CS) R/W Enable 4k x 4k x 4k x 4k x 4k x 4k x 4k x 4k x 1024 1024 1024 1024 1024 1024 1024 1024 SRAM SRAMSRAM SRAMSRAM SRAMSRAM SRAM 1024 1024 1024 1024 1024 1024 1024 column selector, sense amp, and I/O circuits 8 Shared Data Bus 1024 SRAM Modules and Arrays 4M x 8 SRAM 4M x 8 SRAM 4M x 8 SRAM 4M x 8 SRAM R/W A21-0 CS msb lsb Bank 2 CS Bank 3 CS Bank 4 CS SRAM Summary SRAM •A few transistors (~6) per cell •Used for working memory (caches) •But for even higher density… Dynamic-RAM (DRAM) • Data values require constant refresh bit line Dynamic RAM: DRAM Capacitor Gnd Each cell stores one bit, and requires 1 transistors word line Dynamic-RAM (DRAM) Pass-Through Transistors • Data values require constant refresh bit line Dynamic RAM: DRAM Capacitor Gnd Each cell stores one bit, and requires 1 transistors word line Dynamic-RAM (DRAM) Disabled 2) Enable(wordline (wordline==0) 1) 0 Capacitor off on bit line Dynamic RAM: DRAM word line 1) Pre-charge B = Vsupply/2 3) Cell pulls B low i.e. B = 0 Gnd Each cell stores one bit, and requires 1 transistors Read: • pre-charge B and B to Vsupply/2 • pull word line high • cell pulls B low, sense amp detects voltage difference Dynamic-RAM (DRAM) Disabled 1) Enable(wordline (wordline==0) 1) bit line Dynamic RAM: DRAM word line 0 →1 Capacitor Gnd on off 2) Drive B high i.e. B = 1 Charges capacitor Each cell stores one bit, and requires 4 – 8 transistors (6 is typical) Read: • pre-charge B and B to Vsupply/2 • pull word line high • cell pulls B or B low, sense amp detects voltage difference Write: • pull word line high • drive B charges capacitor DRAM vs. SRAM Single transistor vs. many gates • Denser, cheaper ($30/1GB vs. $30/2MB) • But more complicated, and has analog sensing Also needs refresh • • • • Read and write back… …every few milliseconds Organized in 2D grid, so can do rows at a time Chip can do refresh internally Hence… slower and energy inefficient Memory Register File tradeoffs + + – – Very fast (a few gate delays for both read and write) Adding extra ports is straightforward Expensive, doesn’t scale Volatile Volatile Memory alternatives: SRAM, DRAM, … – Slower + Cheaper, and scales well – Volatile Non-Volatile Memory (NV-RAM): Flash, EEPROM, … + Scales well – Limited lifetime; degrades after 100000 to 1M writes Summary We now have enough building blocks to build machines that can perform non-trivial computational tasks Register File: Tens of words of working memory SRAM: Millions of words of working memory DRAM: Billions of words of working memory NVRAM: long term storage (usb fob, solid state disks, BIOS, …) Next time we will build a simple processor!