CS252 Graduate Computer Architecture Lecture 16 Memory Technology (Con’t) Error Correction Codes John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252 Review: 12 Advanced Cache Optimizations • Reducing hit time 1. Small and simple caches 2. Way prediction 3. Trace caches • Reducing Miss Penalty 7. Critical word first 8. Merging write buffers • Increasing cache bandwidth 4. Pipelined caches 5. Multibanked caches 6. Nonblocking caches 3/30/2009 • Reducing Miss Rate 9. Victim Cache 10. Hardware prefetching 11. Compiler prefetching 12. Compiler Optimizations cs252-S09, Lecture 16 2 Review: Main Memory Background • Performance of Main Memory: – Latency: Cache Miss Penalty » Access Time: time between request and word arrives » Cycle Time: time between requests – Bandwidth: I/O & Large Block Miss Penalty (L2) • Main Memory is DRAM: Dynamic Random Access Memory – Dynamic since needs to be refreshed periodically (8 ms, 1% time) – Addresses divided into 2 halves (Memory as a 2D matrix): » RAS or Row Address Strobe » CAS or Column Address Strobe • Cache uses SRAM: Static Random Access Memory – No refresh (6 transistors/bit vs. 1 transistor Size: DRAM/SRAM 4-8, Cost/Cycle time: SRAM/DRAM 8-16 3/30/2009 cs252-S09, Lecture 16 3 DRAM Architecture Col. 1 M word lines Row 1 Row Address Decoder N N+M bit lines Col. 2M Row 2N Column Decoder & Sense Amplifiers Data Memory cell (one bit) D • Bits stored in 2-dimensional arrays on chip • Modern chips have around 4 logical banks on each chip – each logical bank physically implemented as many smaller arrays 3/30/2009 cs252-S09, Lecture 16 4 Review:1-T Memory Cell (DRAM) row select • Write: – 1. Drive bit line – 2.. Select row • Read: – 1. Precharge bit line to Vdd/2 – 2.. Select row bit – 3. Cell and bit line share charges » Very small voltage changes on the bit line – 4. Sense (fancy sense amp) » Can detect changes of ~1 million electrons – 5. Write: restore the value • Refresh – 1. Just do a dummy read to every cell. 3/30/2009 cs252-S09, Lecture 16 5 DRAM Capacitors: more capacitance in a small area • Trench capacitors: – – – – 3/30/2009 • Stacked capacitors Logic ABOVE capacitor Gain in surface area of capacitor Better Scaling properties Better Planarization – Logic BELOW capacitor – Gain in surface area of capacitor – 2-dim cross-section quite small cs252-S09, Lecture 16 6 DRAM Operation: Three Steps • Precharge – charges bit lines to known value, required before next row access • Row access (RAS) – decode row address, enable addressed row (often multiple Kb in row) – bitlines share charge with storage cell – small change in voltage detected by sense amplifiers which latch whole row of bits – sense amplifiers drive bitlines full rail to recharge storage cells • Column access (CAS) – decode column address to select small number of sense amplifier latches (4, 8, 16, or 32 bits depending on DRAM package) – on read, send latched bits out to chip pins – on write, change sense amplifier latches. which then charge storage cells to required value – can perform multiple column accesses on same row without another row access (burst mode) 3/30/2009 cs252-S09, Lecture 16 7 DRAM Read Timing (Example) • Every DRAM access begins at: RAS_L – The assertion of the RAS_L – 2 ways to read: early or late v. CAS CAS_L A WE_L 256K x 8 DRAM 9 OE_L D 8 DRAM Read Cycle Time RAS_L CAS_L A Row Address Col Address Junk Row Address Col Address Junk WE_L OE_L D High Z Junk Data Out Read Access Time Data Out Output Enable Delay Early Read Cycle: OE_L asserted before CAS_L 3/30/2009 High Z Late Read Cycle: OE_L asserted after CAS_L cs252-S09, Lecture 16 8 Main Memory Performance Cycle Time Access Time Time • DRAM (Read/Write) Cycle Time >> DRAM (Read/Write) Access Time – 2:1; why? • DRAM (Read/Write) Cycle Time : – How frequent can you initiate an access? – Analogy: A little kid can only ask his father for money on Saturday • DRAM (Read/Write) Access Time: – How quickly will you get what you want once you initiate an access? – Analogy: As soon as he asks, his father will give him the money • DRAM Bandwidth Limitation analogy: – What happens if he runs out of money on Wednesday? 3/30/2009 cs252-S09, Lecture 16 9 Increasing Bandwidth - Interleaving Access Pattern without Interleaving: CPU Memory D1 available Start Access for D1 Start Access for D2 Memory Bank 0 Access Pattern with 4-way Interleaving: CPU Memory Bank 1 Access Bank 0 Memory Bank 2 Memory Bank 3 Access Bank 1 Access Bank 2 Access Bank 3 We can Access Bank 0 again 3/30/2009 cs252-S09, Lecture 16 10 Main Memory Performance • Wide: • Simple: • Interleaved: – CPU/Mux 1 word; Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits) – CPU, Cache, Bus 1 word: Memory N Modules (4 Modules); example is word interleaved – CPU, Cache, Bus, Memory same width (32 bits) 3/30/2009 cs252-S09, Lecture 16 11 Main Memory Performance • Timing model – 1 to send address, – 4 for access time, 10 cycle time, 1 to send data – Cache Block is 4 words • Simple M.P. = 4 x (1+10+1) = 48 • Wide M.P. = 1 + 10 + 1 = 12 • Interleaved M.P. = 1+10+1 + 3 =15 address address address address 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 Bank 0 3/30/2009 Bank 1 Bank 2 cs252-S09, Lecture 16 Bank 3 12 Avoiding Bank Conflicts • Lots of banks int x[256][512]; for (j = 0; j < 512; j = j+1) for (i = 0; i < 256; i = i+1) x[i][j] = 2 * x[i][j]; • Even with 128 banks, since 512 is multiple of 128, conflict on word accesses • SW: loop interchange or declaring array not power of 2 (“array padding”) • HW: Prime number of banks – – – – 3/30/2009 bank number = address mod number of banks bank number = address mod number of banks address within bank = address / number of words in bank modulo & divide per memory access with prime no. banks? cs252-S09, Lecture 16 13 Finding Bank Number and Address within a bank • Problem: Determine the number of banks, Nb and the number of words in each bank, Wb, such that: – given address x, it is easy to find the bank where x will be found, B(x), and the address of x within the bank, A(x). – for any address x, B(x) and A(x) are unique – the number of bank conflicts is minimized • Solution: Use the following relation to determine B(x) and A(x): B(x) = x MOD Nb A(x) = x MOD Wb where Nb and Wb are co-prime (no factors) – Chinese Remainder Theorem shows that B(x) and A(x) unique. • Condition is satisfied if Nb is prime of form 2m-1: – Since 2k = 2k-m (2m-1) + 2k-m 2k MOD Nb = 2k-m MOD Nb=…= 2j with j < m – And, remember that: (A+B) MOD C = [(A MOD C)+(B MOD C)] MOD C • Simple circuit for x mod Nb – for every power of 2, compute single bit MOD (in advance) – B(x) = sum of these values MOD Nb (low complexity circuit, adder with ~ m bits) 3/30/2009 cs252-S09, Lecture 16 14 Quest for DRAM Performance 1. Fast Page mode – Add timing signals that allow repeated accesses to row buffer without another row access time – Such a buffer comes naturally, as each array will buffer 1024 to 2048 bits for each access 2. Synchronous DRAM (SDRAM) – Add a clock signal to DRAM interface, so that the repeated transfers would not bear overhead to synchronize with DRAM controller 3. Double Data Rate (DDR SDRAM) – Transfer data on both the rising edge and falling edge of the DRAM clock signal doubling the peak data rate – DDR2 lowers power by dropping the voltage from 2.5 to 1.8 volts + offers higher clock rates: up to 400 MHz – DDR3 drops to 1.5 volts + higher clock rates: up to 800 MHz • 3/30/2009 Improved Bandwidth, not Latency cs252-S09, Lecture 16 15 Fast Memory Systems: DRAM specific • Multiple CAS accesses: several names (page mode) – Extended Data Out (EDO): 30% faster in page mode • Newer DRAMs to address gap; what will they cost, will they survive? – RAMBUS: startup company; reinvented DRAM interface » Each Chip a module vs. slice of memory » Short bus between CPU and chips » Does own refresh » Variable amount of data returned » 1 byte / 2 ns (500 MB/s per chip) – Synchronous DRAM: 2 banks on chip, a clock signal to DRAM, transfer synchronous to system clock (66 - 150 MHz) » DDR DRAM: Two transfers per clock (on rising and falling edge) – Intel claims FB-DIMM is the next big thing » Stands for “Fully-Buffered Dual-Inline RAM” » Same basic technology as DDR, but utilizes a serial “daisy-chain” channel between different memory components. 3/30/2009 cs252-S09, Lecture 16 16 Fast Page Mode Operation • Regular DRAM Organization: – N rows x N column x M-bit – Read & Write M-bit at a time – Each M-bit access requires a RAS / CAS cycle Column Address DRAM – N x M “SRAM” to save a row • After a row is read into the register – Only CAS is needed to access other M-bit blocks on that row – RAS_L remains asserted while CAS_L is toggled Row Address N rows • Fast Page Mode DRAM 1st M-bit Access N cols N x M “SRAM” M bits M-bit Output 2nd M-bit 3rd M-bit 4th M-bit Col Address Col Address Col Address RAS_L CAS_L A Row Address 3/30/2009 Col Address cs252-S09, Lecture 16 17 SDRAM timing (Single Data Rate) CAS RAS (New Bank) x CAS Latency Precharge Burst READ • Micron 128M-bit dram (using 2Meg16bit4bank ver) – Row (12 bits), bank (2 bits), column (9 bits) 3/30/2009 cs252-S09, Lecture 16 18 Double-Data Rate (DDR2) DRAM 200MHz Clock Row Column Precharge Row’ Data [ Micron, 256Mb DDR2 SDRAM datasheet ] 3/30/2009 cs252-S09, Lecture 16 400Mb/s Data Rate 19 DDR vs DDR2 vs DDR3 • All about increasing the rate at the pins • Not an improvement in latency – In fact, latency can sometimes be worse • Internal banks often consumed for increased bandwidth 3/30/2009 cs252-S09, Lecture 16 20 DRAM name based on Peak Chip Transfers / Sec DIMM name based on Peak DIMM MBytes / Sec Standard Clock Rate (MHz) M transfers / second DRAM Name Mbytes/s/ DIMM DIMM Name DDR 133 266 DDR266 2128 PC2100 DDR 150 300 DDR300 2400 PC2400 DDR 200 400 DDR400 3200 PC3200 DDR2 266 533 DDR2-533 4264 PC4300 DDR2 333 667 DDR2-667 5336 PC5300 DDR2 400 800 DDR2-800 6400 PC6400 DDR3 533 1066 DDR3-1066 8528 PC8500 DDR3 666 1333 DDR3-1333 10664 PC10700 DDR3 800 1600 DDR3-1600 12800 PC12800 x2 3/30/2009 x8 cs252-S09, Lecture 16 21 DRAM Packaging Clock and control signals ~7 Address lines multiplexed row/column address ~12 DRAM chip Data bus (4b,8b,16b,32b) • DIMM (Dual Inline Memory Module) contains multiple chips arranged in “ranks” • Each rank has clock/control/address signals connected in parallel (sometimes need buffers to drive signals to all chips), and data pins work together to return wide word – e.g., a rank could implement a 64-bit data bus using 16x4-bit chips, or a 64-bit data bus using 8x8-bit chips. • A modern DIMM usually has one or two ranks (occasionally 4 if high capacity) – A rank will contain the same number of banks as each constituent chip (e.g., 4-8) 3/30/2009 cs252-S09, Lecture 16 22 DRAM Channel Rank Rank Bank Bank Chip Chip 16 16 Bank Bank Chip Chip 16 Memory Controller 16 Bank Bank Chip Chip 16 64-bit Data Bus 16 Bank Bank Chip Chip 16 16 Command/Address Bus 3/30/2009 cs252-S09, Lecture 16 23 FB-DIMM Memories Regular DIMM FB-DIMM • Uses Commodity DRAMs with special controller on actual DIMM board • Connection is in a serial form: FB-DIMM FB-DIMM FB-DIMM FB-DIMM FB-DIMM 3/30/2009 cs252-S09, Lecture 16 Controller 24 FLASH Memory Samsung 2007: – Has a floating gate that can hold charge 16GB, NAND Flash • Like a normal transistor but: – To write: raise or lower wordline high enough to cause charges to tunnel – To read: turn on wordline as if normal transistor » presence of charge changes threshold and thus measured current • Two varieties: – NAND: denser, must be read and written in blocks – NOR: much less dense, fast to read and write 3/30/2009 cs252-S09, Lecture 16 25 Phase Change memory (IBM, Samsung, Intel) • Phase Change Memory (called PRAM or PCM) – Chalcogenide material can change from amorphous to crystalline state with application of heat – Two states have very different resistive properties – Similar to material used in CD-RW process • Exciting alternative to FLASH – Higher speed – May be easy to integrate with CMOS processes 3/30/2009 cs252-S09, Lecture 16 26 Tunneling Magnetic Junction • Tunneling Magnetic Junction RAM (TMJ-RAM) – Speed of SRAM, density of DRAM, non-volatile (no refresh) – “Spintronics”: combination quantum spin and electronics – Same technology used in high-density disk-drives 3/30/2009 cs252-S09, Lecture 16 27 Big storage (such as DRAM/DISK): Potential for Errors! • Motivation: – DRAM is dense Signals are easily disturbed – High Capacity higher probability of failure • Approach: Redundancy – Add extra information so that we can recover from errors – Can we do better than just create complete copies? • Block Codes: Data Coded in blocks – – – – k data bits coded into n encoded bits Measure of overhead: Rate of Code: K/N Often called an (n,k) code Consider data as vectors in GF(2) [ i.e. vectors of bits ] • Code Space is set of all 2n vectors, Data space set of 2k vectors – Encoding function: C=f(d) – Decoding function: d=f(C’) – Not all possible code vectors, C, are valid! 3/30/2009 cs252-S09, Lecture 16 28 Error Correction Codes (ECC) • Memory systems generate errors (accidentally flippedbits) – DRAMs store very little charge per bit – “Soft” errors occur occasionally when cells are struck by alpha particles or other environmental upsets. – Less frequently, “hard” errors can occur when chips permanently fail. – Problem gets worse as memories get denser and larger • Where is “perfect” memory required? – servers, spacecraft/military computers, ebay, … • Memories are protected against failures with ECCs • Extra bits are added to each data-word – used to detect and/or correct faults in the memory system – in general, each possible data word value is mapped to a unique “code word”. A fault changes a valid code word to an invalid one - which can be detected. 3/30/2009 cs252-S09, Lecture 16 29 General Idea: Code Vector Space Code Space C0=f(v0) Code Distance (Hamming Distance) v0 • Not every vector in the code space is valid • Hamming Distance (d): – Minimum number of bit flips to turn one code word into another • Number of errors that we can detect: (d-1) • Number of errors that we can fix: ½(d-1) 3/30/2009 cs252-S09, Lecture 16 30 Some Code Types • Linear Codes: C G d S H C Code is generated by G and in null-space of H – (n,k) code: Data space 2k, Code space 2n – (n,k,d) code: specify distance d as well • Random code: – Need to both identify errors and correct them – Distance d correct ½(d-1) errors • Erasure code: – Can correct errors if we know which bits/symbols are bad – Example: RAID codes, where “symbols” are blocks of disk – Distance d correct (d-1) errors • Error detection code: – Distance d detect (d-1) errors • Hamming Codes – d = 3 Columns nonzero, Distinct – d = 4 Columns nonzero, Distinct, Odd-weight • Binary Golay code: based on quadratic residues mod 23 – Binary code: [24, 12, 8] and [23, 12, 7]. – Often used in space-based schemes, can correct 3 errors 3/30/2009 cs252-S09, Lecture 16 31 Hamming Bound, symbols in GF(2) • Consider an (n,k) code with distance d – How do n, k, and d relate to one another? • First question: How big are spheres? – For distance d, spheres are of radius ½ (d-1), » i.e. all error with weight ½ (d-1) or less must fit within sphere – Thus, size of sphere is at least: 1 + Num(1-bit err) + Num(2-bit err) + …+ Num( ½(d-1) – bit err) 1 ( d 1) 2 e 0 Size n e • Hamming bound reflects bin-packing of spheres: – need 2k of these spheres within code space 1 ( d 1) 2 e 0 2 k 3/30/2009 n 2 n e 2k (1 n) 2n , d 3 cs252-S09, Lecture 16 32 How to Generate code words? • Consider a linear code. Need a Generator Matrix. – Let vi be the data value (k bits), Ci be resulting code (n bits): Ci G vi G must be an nk matrix • Are there 2k unique code values? – Only if the k columns of G are linearly independent! • Of course, need some way of decoding as well. v f C ' i d i – Is this linear??? Why or why not? • A code is systematic if the data is directly encoded within the code words. I G P – Means Generator has form: – Can always turn non-systematic code into a systematic one (row ops) 3/30/2009 cs252-S09, Lecture 16 33 Implicitly Defining Codes by Check Matrix • But – what is the distance of the code? Not obvious • Instead, consider a parity-check matrix H (n[n-k]) – Compute the following syndrome Si given code element Ci: S i H Ci – Define valid code words Ci as those that give Si=0 (null space of H) – Size of null space? (n-rank H)=k if (n-k) linearly independent columns in H • Suppose you transmit code word C, and there is an error. Model this as vector E which flips selected bits of C to get R (received): R CE Error vector • Consider what happens when we multiply by H: S H R H (C E ) H E • What is distance of code? – Code has distance d if no sum of d-1 or less columns yields 0 – I.e. No error vectors, E, of weight < d have zero syndromes – Code design: Design H matrix with these properties 3/30/2009 cs252-S09, Lecture 16 34 How to relate G and H (Binary Codes) • Defining H makes it easy to understand distance of code, but hard to generate code (H defines code implicitly!) • However, let H be of following form: H P | I P is (n-k)k, I is (n-k)(n-k) Result: H is (n-k)k • Then, G can be of following form (maximal code size): I G P P is (n-k)k, I is kk Result: G is nk • Notice: G generates values in null-space of H I S i H G v i P | I v i 0 P 3/30/2009 cs252-S09, Lecture 16 35 Simple example (Parity, D=2) • Parity code (8-bits): 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 G 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 H 111111111 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 C8 C7 C6 C5 C4 C3 C2 C1 C0 v7 v6 v5 v4 v3 v2 v1 v0 + c8 + s0 • Note: Complexity of logic depends on number of 1s in row! 3/30/2009 cs252-S09, Lecture 16 36 Simple example: Repetition (voting, D=3) • Repetition code (1-bit): 1 G 1 1 1 1 0 H 1 0 1 • Positives: simple • Negatives: C0 v0 C1 C2 C0 C1 Error C2 – Expensive: only 33% of code word is data – Not packed in Hamming-bound sense (only D=3). Could get much more efficient coding by encoding multiple bits at a time 3/30/2009 cs252-S09, Lecture 16 37 Simple Example: Hamming Code (d=3) • Example: (7,4) code: 1 1 0 – Protect 4 data bits with 3 parity bits 1 2 3 4 5 6 7 1 0 1 p1 p2 d1 p3 d2 d3 d4 1 0 0 • Bit position number 001 = 110 G 0 1 1 011 = 310 0 1 0 p 1 Note: 101 = 510 number bits 111 = 710 0 0 1 from left to 010 = 210 right. 011 = 310 0 0 0 p2 3/30/2009 110 = 610 111 = 710 100 = 410 101 = 510 110 = 610 111 = 710 p3 1 1 0 1 0 0 1 1 0 1 0 1 0 1 H 0 1 1 0 0 1 1 0 0 0 1 1 1 1 cs252-S09, Lecture 16 38 How to correct errors? • But – what is the distance of the code? Not obvious • Instead, consider a parity-check matrix H (n[n-k]) – Compute the following syndrome Si given code element Ci: S i H Ci H E • Suppose that two correctable error vectors E1 and E2 produce same syndrome: H E1 H E2 H E1 E2 0 E1 E2 has d or more bits set • But, since both E1 and E2 have (d-1)/2 bits, E1 + E2 d-1 bits set this cannot be true! • So, syndrome is unique indicator of correctable error vectors 3/30/2009 cs252-S09, Lecture 16 39 Example, d=4 code (SEC-DED) • Design H with: – All columns non-zero, odd-weight, distinct » Note that odd-weight refers to Hamming Weight, i.e. number of zeros • Why does this generate d=4? – Any single bit error will generate a distinct, non-zero value – Any double error will generate a distinct, non-zero value » Why? Add together two distinct columns, get distinct result – Any triple error will generate a non-zero value » Why? Add together three odd-weight values, get an odd-weight value – So: need four errors before indistinguishable from code word • Because d=4: – Can correct 1 error (Single Error Correction, i.e. SEC) – Can detect 2 errors (Double Error Detection, i.e. DED) • Example: – Note: log size of nullspace will be (columns – rank) = 4, so: » Rank = 4, since rows independent, 4 cols indpt » Clearly, 8 bits in code word » Thus: (8,4) code 3/30/2009 S0 1 S1 1 S 1 2 S 0 3 cs252-S09, Lecture 16 1 1 0 1 1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 C0 C1 0 C2 0 C3 0 C4 1 C5 C6 C 7 40 Tweeks: • No reason cannot make code shorter than required • Suppose n-k=8 bits of parity. What is max code size (n) for d=4? – Maximum number of unique, odd-weight columns: 27 = 128 – So, n = 128. But, then k = n – (n – k) = 120. Weird! – Just throw out columns of high weight and make 72, 64 code! • But – shortened codes like this might have d > 4 in some special directions – Example: Kaneda paper, catches failures of groups of 4 bits – Good for catching chip failures when DRAM has groups of 4 bits • What about EVENODD code? – Can be used to handle two erasures – What about two dead DRAMs? Yes, if you can really know they are dead 3/30/2009 cs252-S09, Lecture 16 41 3/30/2009 cs252-S09, Lecture 16 42 Aside: Galois Field Elements • Definition: Field: a complete group of elements with: – – – – Addition, subtraction, multiplication, division Completely closed under these operations Every element has an additive inverse Every element except zero has a multiplicative inverse • Examples: – Real numbers – Binary, called GF(2) Galois Field with base 2 » Values 0, 1. Addition/subtraction: use xor. Multiplicative inverse of 1 is 1 – Prime field, GF(p) Galois Field with base p » Values 0 … p-1 » Addition/subtraction/multiplication: modulo p » Multiplicative Inverse: every value except 0 has inverse » Example: GF(5): 11 1 mod 5, 23 1mod 5, 44 1 mod 5 – General Galois Field: GF(pm) base p (prime!), dimension m » Values are vectors of elements of GF(p) of dimension m » Add/subtract: vector addition/subtraction » Multiply/divide: more complex » Just like read numbers but finite! » Common for computer algorithms: GF(2m) 3/30/2009 cs252-S09, Lecture 16 43 Reed-Solomon Codes • Galois field codes: code words consist of symbols – Rather than bits • Reed-Solomon codes: – – – – – Based on polynomials in GF(2k) (I.e. k-bit symbols) Data as coefficients, code space as values of polynomial: P(x)=a0+a1x1+… ak-1xk-1 Coded: P(0),P(1),P(2)….,P(n-1) Can recover polynomial as long as get any k of n • Properties: can choose number of check symbols – Reed-Solomon codes are “maximum distance separable” (MDS) – Can add d symbols for distance d+1 code – Often used in “erasure code” mode: as long as no more than n-k coded symbols erased, can recover data • Side note: Multiplication by constant in GF(2k) can be represented by kk matrix: ax – Decompose unknown vector into k bits: x=x0+2x1+…+2k-1xk-1 – Each column is result of multiplying a by 2i 3/30/2009 cs252-S09, Lecture 16 44 Reed-Solomon Codes (con’t) • Reed-solomon codes (Non-systematic): – Data as coefficients, code space as values of polynomial: – P(x)=a0+a1x1+… a6x6 – Coded: P(0),P(1),P(2)….,P(6) • Called Vandermonde Matrix: maximum rank • Different representation (This H’ and G not related) – Clear that all combinations of two or less columns independent d=3 – Very easy to pick whatever d you happen to want 10 0 2 0 3 G 40 0 5 60 0 7 0 1 H ' 1 1 11 21 31 41 51 61 71 12 22 32 42 52 62 72 13 23 33 43 53 63 73 14 4 2 a0 4 3 a1 4 4 a2 54 a3 6 4 a4 4 7 20 30 40 50 60 21 31 41 51 61 70 1 7 • Fast, Systematic version of Reed-Solomon: – Cauchy Reed-Solomon 3/30/2009 cs252-S09, Lecture 16 45 Conclusion • Main memory is Dense, Slow – Cycle time > Access time! • Techniques to optimize memory – – – – Wider Memory Interleaved Memory: for sequential or independent accesses Avoiding bank conflicts: SW & HW DRAM specific optimizations: page mode & Specialty DRAM • ECC: add redundancy to correct for errors – (n,k,d) n code bits, k data bits, distance d – Linear codes: code vectors computed by linear transformation • Erasure code: after identifying “erasures”, can correct • Reed-Solomon codes – Based on GF(pn), often GF(2n) – Easy to get distance d+1 code with d extra symbols – Often used in erasure mode 3/30/2009 cs252-S09, Lecture 16 46