NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 1 of 22 NAND Flash Memories For Spacecraft By ECC Technologies, Inc. (“ECC Tek”) 4750 Coventry Road East Minnetonka, MN 55345-3909 www.ecctek.com Phone: 952-935-2885 Fax: 952-935-2491 Email: phil.white@ecctek.com www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 2 of 22 Notice This document does not contain any information ECC Technologies, Inc. considers confidential so you may copy and distribute the document freely. Phil White President ECC Technologies, Inc. (ECC Tek) 4750 Coventry Road East Minnetonka, MN 55345-3909 Phone: Fax: E-mail: Website: www.ecctek.com 952-935-2885 952-935-2491 phil.white@ecctek.com www.ecctek.com. phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 3 of 22 Revision History 05-02-10 pew 05-07-10 05-10-10 05-12-10 pew pew pew 05-12-10 pew 05-13-10 pew www.ecctek.com Created initial version of this document based on previous documents and drawings. Created v1 of this document. Created v2 of this document. Added sections to handle NAND Flash memory configurations with and without data path width converters. Created v3 of this document. Widened the second sample configuration to handle 65-bit wide data instead of 48 as in v2. Stopped using version numbers so that we can focus on developing one final document. Readers will know what version they have by looking at the Revision History and the date of the document. phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 4 of 22 Sections 1. Introduction ............................................................................................................................................5 2. Brief History of ECC in NAND Flash ...................................................................................................6 3. Issues Involved in Continuing to Increase t in Binary BCH Codes .......................................................6 4. N-Page Blocks Divided into Segments ..................................................................................................7 5. 2D Encoding Concept ............................................................................................................................8 6. Hardware Encoding and Decoding of Segments ...................................................................................8 7. Correcting Erasures Plus Errors .............................................................................................................9 8. Repeated Decodings.............................................................................................................................10 9. Interface to the NAND Flash Memory ................................................................................................10 10. High-Level Block Diagram of Data Collection and Downloading System .........................................12 11. NAND Flash Memory Configurations With Data Path Width Converters .........................................13 11.1. 2D RS Encoding for First Sample Configuration ........................................................................15 11.2. 2D RS Decoding for First Sample Configuration ........................................................................16 11.3. Percentage Redundancy for First Sample Configuration .............................................................17 12. NAND Flash Memory Configurations Without Data Path Width Converters ....................................18 12.1. 2D RS Encoding for Second Sample Configuration ....................................................................20 12.2. 2D RS Decoding for Second Sample Configuration ....................................................................20 12.3. Percentage Redundancy for Second Sample Configuration .........................................................22 Figures Figure 1 N Pages Divided into Segments .................................................................................................. 7 Figure 2 Encoded Data for 2D RS ECC Schemes ..................................................................................... 8 Figure 3 Encoding and Decoding of 2D Arrays in Hardware ................................................................... 9 Figure 4 64-bit to 44-bit Data Path Width Converter .............................................................................. 11 Figure 5 64-bit to 40-bit Data Path Width Converter .............................................................................. 11 Figure 6 Spacecraft Data Collection and Downloading System .............................................................. 12 Figure 7 Encoded 2R RS Array for First Sample Configuration ............................................................. 14 Figure 8 2D RS Encoding for First Sample Configuration...................................................................... 15 Figure 9 2D RS Decoding for First Sample Configuration ..................................................................... 16 Figure 10 Percentage Redundancy for First Sample Configuration ........................................................ 17 Figure 11 Encoded 2D RS Array for Second Sample Configuration ...................................................... 19 Figure 12 2D RS Encoding for Second Sample Configuration ............................................................... 20 Figure 13 2D RS Decoding for Second Sample Configuration ............................................................... 21 Figure 14 Percentage Redundancy for Second Sample Configuration .................................................... 22 www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 5 of 22 1. Introduction This document describes how to apply ECC Tek’s two-dimensional (2D) RS error-correction (ECC) system to create fault-tolerant NAND Flash memories for spacecraft. Many commercial companies are currently developing NAND Flash storage systems. For example, Fusion-IO has recently received an additional $45M in funding and Pliant Technology has recently received an additional $27M. Established hard disk drive (HDD) companies Seagate and Western Digital also sell NAND Flash SSD devices. NAND Flash solid state drives (SSDs) are attractive as replacements for HDDs because they are faster, take less power, and are more shock resistant than HDDs. Unlike HDDs, SSDs have no moving parts and are silent. However, there are serious problems associated with using NAND Flash devices that need to be solved. NAND Flash devices were originally designed as electrically erasable programmable read only memories (EEPROMs) and were not originally designed for random access memories. NAND Flash devices cannot be reprogrammed/rewritten an unlimited number of times as HDDs can because they degrade with use. Also, blocks of storage cells in NAND Flash devices must be erased before sub blocks (Pages) can be programmed/written. The wear-out and failure characteristics of NAND Flash devices require the use of a powerful errorcorrection system to allow NAND Flash memories to be reliable over a long period of time. The use of NAND Flash memories in spacecraft will also require the ECC system to tolerate entire chip failures. This document presents a general methodology for implementing highly reliable and fault-tolerant NAND Flash memories. In order to quickly and effectively convey the concepts involved, two sample configurations are described. The first configuration requires data path width converters and the second one does not. The two configurations show designers how to create other configurations with the same characteristics as the two samples. The first configuration is optimized to reduce gate count. The second configuration is optimized to make it as easy as possible to match the data path width of the NAND Flash memory to the widths of other system data paths. Readers should keep in mind that other cases can be easily created with different Reed-Solomon (RS) symbol sizes and different levels of correction. The methodology is not limited to the two described sample cases. ECC Tek strongly believes there is very little risk in implementing the methodology described in this document because ECC Tek has more than 30 years experience in designing ECC circuits, the idea of correcting erasures proposed in this document has been implemented in a slightly different form in multi-track tape drives to tolerate the failure of multiple tracks, and the idea of using error-correction software as a recovery procedure in the event that hardware correction fails was used in the first ReedSolomon HDD implementations at what is now Seagate. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 6 of 22 With a 2D RS scheme as described in this document the designer of a NAND Flash memory has the freedom of arbitrarily increasing the level of protection and also the number of failed chips the system can tolerate. No other existing ECC system gives designers that freedom. 2. Brief History of ECC in NAND Flash Shortly after NAND Flash devices were first manufactured, an ECC system was recommended by Samsung that corrected t=1 bit in each Page. Pages were then ~512 bytes. The t=1 ECC recommended by Samsung is very similar to a Hamming code which is a binary BCH code that can correct t=1 bit. ECC Tek has designed and licensed numerous ECC designs for NAND Flash – a number of them are currently in mass production. The original ECC design ECC Tek licensed for NAND Flash was a Reed-Solomon (RS) system that operated on 10-bit symbols and corrected t=5 10-bit symbol errors. Two of the bits in each symbol were forced to 0 in the data field so that the data field actually consisted of 8-bit bytes. ECC Tek’s first RS system for NAND Flash was implemented in 2005 and went into mass production shortly thereafter. In 2007, ECC Tek developed and licensed its first programmable binary BCH encoder and decoder designs for use with NAND Flash devices which could correct up to t=18 bits. Shortly thereafter, ECC Tek developed and licensed a programmable binary BCH ECC system that could correct up to t=30 bits. In the last 3 years, ECC Tek has licensed a number of similar binary BCH encoder and decoder designs for NAND Flash. ECC Tek has developed and synthesized a binary BCH design that can correct up to t=44 bits in 1024byte Pages, and there has been interest in binary BCH designs for NAND Flash that can correct up to t=60 bits. 3. Issues Involved in Continuing to Increase t in Binary BCH Codes When the number of bits correctable, t, and/or the data field length, K, increases, the complexity of binary BCH encoders and decoders increase exponentially which limits the degree to which t and K can be increased. ECC Tek has estimated that correcting t=60 bits in 1024-byte data field lengths will require around 250K gates in an ASIC when a fairly low level of parallelism is implemented in the decoder. Implementing a t=60 binary BCH design in an FPGA may not be practical at the present time. When implementing binary BCH encoders and decoders, the codeword contains binary symbols “bits” and a Galois or Finite Field must be used to uniquely identify, tag, address or locate each codeword symbol/bit. For NAND Flash Page sizes larger than 1024 bytes but less than 2048 bytes, a locator field with m=14-bit elements must be used. Fourteen-bit finite field elements are relatively large. The number of redundant bits required by a t-bit error-correcting binary BCH code is r < mt where m is the width of the locator field elements. Usually the number of redundant bits required is r = mt. When implementing RS codes, the codeword contains nonbinary symbols and smaller locator fields can be used than what are required for binary BCH codes. For example a RS locator field with 10-bit www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 7 of 22 elements was used in ECC Tek’s first RS design for Flash, and a locator field with 13-bit elements was used in ECC Tek’s first binary BCH design. Both designs were for 512-byte data field lengths. Apparently as time passes and feature sizes decrease the reliability of NAND Flash devices is decreasing because ECC Tek continues to receive requests to correct more and more bits – up to t=60 bits per Page. Seven years earlier only 1 bit per Page was corrected. This trend will most likely continue especially with multi-level cell (MLC) NAND Flash devices. Designing binary BCH encoders and decoders with very large t is difficult because the amount of time it takes to compute the error locator polynomial, L(x), from the Syndrome, S(x), and the complexity (gate count) of the decoder increases rapidly as t is increased. Synthesis results indicate that a t=50 binary BCH decoder would probably take ~250K gates in an ASIC if the decoder was designed so that it could keep up with continuous input data when correcting 50 bits in each received word. The above-mentioned ECC schemes do not include any provision to correct errors caused by entire chip failures. In order to achieve fault tolerance, some type of 2D ECC scheme must be used. 4. N-Page Blocks Divided into Segments Data will be written to N NAND Flash chips in N-Page Blocks which are divided into logical Segments as illustrated in Figure 1. Think of each column in Figure 1 as containing a single column of “bits”. Assume one Page contains approximately 4K bytes. 1 2 3 N Segment 1 Segment 2 Segment n Figure 1 N Pages Divided into Segments Each Segment will contain N vertical codewords. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 8 of 22 For binary BCH codes currently being implemented, each vertical codeword is approximately 512 or 1024 bytes with 4 or 8 Segments per Page. ECC Tek is proposing the use of RS vertical codewords for the 2D ECC system instead of binary BCH codewords and much smaller Segments than what are currently being used. The maximum height (in number of symbols) of one vertical codeword in a Segment is determined by the size of the RS column symbol being used. 5. 2D Encoding Concept Think of the data in one Segment as a matrix or two-dimensional array of data items. Think of the columns as vertical RS codewords and the rows that contain data as horizontal RS codewords. Figure 2 illustrates how redundant data items are appended onto freely chosen data items in a 2D ECC encoding scheme. Most likely 2D encoding schemes will be widely implemented in the near future. In Figure 2, the rows are encoded first and then the columns. Freely Chosen Data Item Redundant Data Items Redundant data items are mathematical functions of the freely chosen data items or previously computed redundant data items. 2D Array Figure 2 Encoded Data for 2D RS ECC Schemes 6. Hardware Encoding and Decoding of Segments A high-level block diagram of a NAND Flash storage device that encodes and decodes Segments in hardware using a parallel Reed-Solomon (PRS) row encoder and decoder and multiple RS column encoders and decoders is illustrated in Figure 3. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 9 of 22 Data In PRS Row Encoder RS Column Encoder RS Column Encoder RS Column Encoder NAND Flash Chip 1 NAND Flash Chip 2 NAND Flash Chip N RS Column Decoder RS Column Decoder RS Column Decoder PRS Row Decoder Data Out Figure 3 Encoding and Decoding of 2D Arrays in Hardware When writing, rows are encoded by the PRS row encoder, columns are encoded by multiple RS column encoders, and a 2D encoded Segment, as illustrated in Figure 2, is written to N NAND Flash chips. When reading, columns read from the N NAND Flash chips are decoded by multiple RS column decoders and rows are decoded by a PRS row decoder. The PRS row decoder will pass through the vertical redundancy unaltered if any of the row decodings for one Segment fails. If none of the row decodings for one Segment fails, the 2D decoder has determined that either no errors occurred or that all of the errors have been properly corrected. Since multiple column encoders and decoders are required, the complexity of one column encoder and decoder pair must be reasonable in order for the 2D scheme to be practical – especially when the 2D ECC scheme is implemented in FPGAs as is the standard practice for spacecraft. Since we are considering relatively small symbol sizes, there will necessarily be multiple Segments per Page. For example, if we consider 4096-byte Pages and 6-bit column symbols, there will be more than 92 Segments per Page. 7. Correcting Erasures Plus Errors Both the RS column and row encoders and decoders implemented in hardware can be designed to correct erasures plus errors. An erasure means a codeword location where the codeword symbol in that location has a high probability of being in error, but is not necessarily in error. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 10 of 22 The PRS row decoder should always be designed to correct errors plus erasures, but the RS column decoders would probably be designed to correct errors only. When a column decoding fails, the decode fail signal from the column decoder indicates that all of the row symbols in that vertical codeword are erased. The PRS row decoder can correct t errors and s erasures as long as 2t + s < R where R is the number of redundant symbols per row codeword. 8. Repeated Decodings Both the row and column decoders can output entire corrected codewords (both corrected data and corrected redundancy) so both column and row decoding on each Segment can be repeated an unlimited number of times if desired or necessary. If none of the row decodings on a Segment fail, then it is assumed that the data in that segment either has no errors or that all of the errors have been correctly corrected. If any of the row decodings fail, the PRS row decoder will pass the corrected column redundancy through the PRS row decoder without altering it so that column and row decodings can be repeated if desired. It is possible to duplicate the column and row hardware decoders multiple times or to feed the data back to the input by using more FIFOs and muxes, but that would probably be overkill since one column decoding followed by one row decoding will most likely correct all of the errors almost all of the time. Most likely the best way to implement multiple decodings would be to do repeated decodings in software only when needed. Software correction time is probably not critical because repeated decodings would rarely be done. That way, row and column decodings can be repeated any number of times and no additional hardware is required so it keeps the complexity of the ECC hardware to a minimum. When an error pattern occurs that is too severe for one column and row decoding to correct it (which may never happen or may happen only once a year), then some means would need to be provided for the software to access and redecoded the codewords resulting from one correction and repeat the correction any number of times in an attempt to recover the data. Since ECC Tek licenses both C and Verilog code, the C code can be used for that purpose. 9. Interface to the NAND Flash Memory N is the number of NAND Flash chips that are written and read simultaneously. In the first sample configuration N=15, the row symbol width is 4 and the number of redundant symbols is 4 so the input and output data path widths for the NAND Flash memory can be up to 11 symbols = 44 bits. What if we want to use a 44-bit wide NAND Flash memory in a 64-bit system? To do that, we need a 64-bit-to-44-bit converter at the input of the NAND Flash memory and a 44-bit-to-64-bit converter at the output. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 11 of 22 For the input converter in this case we need to find the smallest integers x and y such that 64x = 44y. Note that x/y=44/64=22/32=11/16. Since 11 and 16 have no common factors, we need 11, 64-bit registers to convert a 64-bit data path to a 44-bit data path as illustrated in Figure 4. The circuit shown in Figure 4 can be thought of as a FIFO with different input and output widths. Control logic similar to a standard FIFO’s control logic can be developed. There are several ways to handle the clock with a 64-bit-to-44-bit input converter. One way is to pause five clock cycles for every 11 clock cycles while writing to the NAND Flash memory. Another way is to implement a state machine which will monitor the circuit and issue a pause signal to the input as needed. Another way is to use a higher frequency clock for the NAND Flash memory than what is used for the input logic. 64 1 2 1 2 3 4 5 6 7 8 9 10 11 SM 16 44 Figure 4 64-bit to 44-bit Data Path Width Converter Other converters are simpler such as the 64-bit-to-40-bit input converter shown in Figure 5. 64 1 SM 2 3 4 5 40 1 2 8 40 Figure 5 64-bit to 40-bit Data Path Width Converter Converters required at the output follow the same pattern as those shown and are easy to design. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 12 of 22 10. High-Level Block Diagram of Data Collection and Downloading System We will describe how to implement highly reliable, fault-tolerant NAND Flash memories to be used in spacecraft data collection and downloading systems as illustrated in Figure 6. Sensors, Cameras, etc. Data Collector pause control State Machine Main Data FIFO (Dual-Port RAM) status Trasmitters Earth Figure 6 Spacecraft Data Collection and Downloading System Assume that the Data Collector can be paused between words written to the Main Data FIFO (MDFIFO). The MDFIFO can be implemented using a dual-port RAM. Assume the read and write address registers are initially reset to 0. When a word is written to the MDFIFO, the write address is incremented by 1. When a word is read from the MDFIFO, the read address is incremented by 1. Whenever the write and read addresses equal the end of the RAM, they are reset to 0. Assume that we have a “full_level” variable that is incremented by 1 every time a word is written to the MDFIFO and decremented by 1 every time a word is read from the MDFIFO. If a word is written and read in the same clock cycle, the full_level is not changed. The full_level variable indicates the number of words that are currently in the MDFIFO. In order for the Transmitter to read the MDFIFO, the full_level must be > 1. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 13 of 22 If the full_level is close to the size of the RAM, the pause signal to the Data Collector will be asserted indicating that inputs to the MDFIFO must be paused to avoid an overflow condition. The above-described method of implementing a synchronous FIFO is the method that ECC Tek uses in all of its ECC designs for the Data FIFO (DFIFO) in its decoders. The remainder of this document describes how the MDFIFO can be designed using NAND Flash memory chips so that it will be highly reliable over time and contain an arbitrary level of fault-tolerance. 11. NAND Flash Memory Configurations With Data Path Width Converters Data path width converters may sometimes be required if designers wish to minimize the complexity (gate count) of the ECC circuits. Since most spacecraft electronics are implemented in FPGAs, it is often important and necessary to reduce gate counts so the circuits will fit into a small number of FPGAs. We will use 6-bit column symbols and 4-bit row symbols for the first sample configuration with 4 redundant row symbols per horizontal codeword and 6 redundant column symbols per vertical codeword. For that case, up to 44-bit data words can be input in parallel. With 6-bit column symbols, the maximum height of a vertical RS codeword is 63 6-bit symbols = 378 bits. Therefore, the height of each Segment must be < 63, 6-bit symbols. For the first sample configuration, the height of one Segment was chosen to be 60, 6-bit symbols because 60 symbols * 6 bits/symbol = 360 bits which is a multiple of 4, 6 and 8 so that one column of a Segment contains an equal number of 4-bit, 6-bit and 8-bit quantities/symbols. With NAND Flash Page sizes of ~4096 bytes, there will ~92 Segments per Page. A 2D encoded array with 4-bit row symbols outlined in red and 6-bit column symbols outlined in blue is illustrated in Figure 7. All of the bits from one column are written to one NAND Flash chip. It doesn’t matter whether the NAND Flash chip inputs and outputs 8-bits at a time or 16-bits at a time or more. The only thing that matters is that all of the bits from one column are written to one NAND Flash chip so that if one NAND Flash chip fails, only one column of bits will be affected. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 14 of 22 Vertical Codeword 6-bit Symbols Horizontal Codeword 4-bit Symbols 1 2 3 4 5 6 11 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0 1 1 1 4-bit Symbol 6-bit Symbol Figure 7 Encoded 2R RS Array for First Sample Configuration The above scheme allows up to 15 4-bit symbols per horizontal codeword and up to 63 6-bit symbols per vertical codeword. Most of the time, the column decoders will be able to correctly recover the data without the aid of the row decoder. However, if a NAND Flash chip fails, then many column decodings in one column will fail indicating that all of the symbols in those codewords should be considered erasures by the row decoder. With 4 redundant row symbols, up to t errors and s NAND Flash chip failures can be corrected in each row as long as 2t + s < 4 which is probably more than sufficient. In other words, with no soft errors, up to 4 NAND Flash chips can fail with no loss of data. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 15 of 22 With 6 redundant 6-bit column symbols, up to 3 soft errors in 60 can be corrected in each vertical codeword. Some work should be done to determine how many vertical symbols need to be corrected for a specific type of NAND Flash device, but it’s OK for the column decoding to fail every once in a while because the row decoder can correct for multiple column decoding failures. 11.1. 2D RS Encoding for First Sample Configuration A block diagram of the 2D RS encoder hardware for the first sample configuration is shown in Figure 8. Pause Input OR Input 1 2 3 4 5 6 7 8 9 10 11 Parallel RS Encoder 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Row Symbol to Column Symbol Converter Row Symbol to Column Symbol Converter Serial RS Encoder Serial RS Encoder Write Interface to Flash Chip Write Interface to Flash Chip Flash Chip Flash Chip Figure 8 2D RS Encoding for First Sample Configuration Data can be continuously input to the 2D RS encoder as long as the pause input signal is not asserted. Whenever the encoder reaches the end of a data field, it will request the input to pause while it outputs the redundancy for the current N vertical codewords. Once the pause signal is deasserted, the input can continue to input data words. All of the pieces of logic shown in Figure 8 have been previously designed and licensed by ECC Tek. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 16 of 22 11.2. 2D RS Decoding for First Sample Configuration A block diagram of the 2D RS decoder for the first sample configuration is shown in Figure 9. Flash Chip 1 Flash Chip N Read Interface to Flash Chip Read Interface to Flash Chip RS Column 1 Decoder RS Column N Decoder Fail Fail F I F O State Machine Control F I F O Col Symbol to Row Symbol Converter Fail Col Symbol to Row Symbol Converter PRS Row Decoder Output Figure 9 2D RS Decoding for First Sample Configuration The FIFOs are needed to store corrected vertical codewords from one Segment because a column decoder cannot know if decoding has failed until the entire codeword has been outputted, but the PRS Row Decoder needs to receive an erasure indication at the same time it receives an input word. The Fail signal from the column decoder is the erasure indication. A column decoding failure means the entire vertical corrected codeword stored in the decoder’s output FIFO has been “erased”. For the first sample configuration, the Column Decoder FIFO only needs to store 60 6-bit symbols. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 17 of 22 11.3. Percentage Redundancy for First Sample Configuration Figure 10 shows the first sample configuration has 34% redundancy. The percentage redundancy is a function of the level of protection provided. The more protection provided, the higher the percentage redundancy. One Segment 4 bits 11 bits 324 bits Row Redundancy Data 36 bits 360 bits Column Redundancy 15 bits Total number of bits in Segment = 360 x 15 = 5400 Number of Redundant bits in Segment = 1836 Percentage Redundancy = 1836/5400 = .34 = 34% Figure 10 Percentage Redundancy for First Sample Configuration www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 18 of 22 12. NAND Flash Memory Configurations Without Data Path Width Converters This section describes how to design 2D RS NAND Flash memories that do not require data path width converters. Generally speaking, these configurations will be more complex and take more gates to implement than configurations with data path width converters. We will use 6-bit column symbols and 5-bit row symbols for the second sample configuration with 4 redundant row symbols per horizontal codeword and 6 redundant column symbols per vertical codeword. For this case, up to 135-bit data words can be input in parallel, but we will input 65-bit data words = 13 5-bit symbols. The 65th bit can be forced to 0 or it can be used for some type of auxiliary data. With 6-bit column symbols, the maximum height of a vertical RS codeword is 63 6-bit symbols = 378 bits. In this second configuration, the row and column symbol boundaries are aligned every 30 bits since 5 and 6 have no common factors. To determine how often the row and column symbol boundaries align, we use the same method as we used for the data path width converters. That is, we find the lowest x and y so that 6x=5y. Note that x/y=5/6 which cannot be reduced further so boundaries in this case will be aligned only every 30 bits. In the first configuration the row and column boundaries were aligned every 12 bits since 4/6=2/3. Although Verilog code has not been developed for these designs yet, it appears that the symbol boundaries must line up at the end of the data. In other words, it appears that the height (in bits) of a data field shown in Figure 14 in a Segment for this configuration must be 30i where i is an integer. It also appears that the height of the column redundancy field shown in Figure 14 can be any number of 6-bit symbols since the PRS row decoder ignores the column redundancy fields. The height of one Segment was chosen to be 60, 6-bit symbols because 60 symbols * 6 bits/symbol = 360 bits which is a multiple of 5, 6 and so that one column of a Segment contains an equal number of 5-bit and 6-bit quantities/symbols. With NAND Flash Page sizes of ~4096 bytes, there will ~92 Segments per Page. A 2D encoded array with 5-bit row symbols outlined in red and 6-bit column symbols outlined in blue is illustrated in Figure 11. All of the bits from one column are written to one NAND Flash chip. It doesn’t matter whether the NAND Flash chip inputs and outputs 8-bits at a time or 16-bits at a time or more. The only thing that matters is that all of the bits from one column are written to one NAND Flash chip so that if one NAND Flash chip fails, only one column of bits will be affected. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 19 of 22 Vertical Codeword 6-bit Symbols Horizontal Codeword 5-bit Symbols 1 2 3 4 5 6 13 14 15 16 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0 1 1 1 5-bit Symbol 6-bit Symbol Figure 11 Encoded 2D RS Array for Second Sample Configuration The above scheme allows up to 31 5-bit symbols per horizontal codeword and up to 63 6-bit symbols per vertical codeword. Most of the time, the column decoders will be able to correctly recover the data without the aid of the row decoder. However, if a NAND Flash chip fails, then many column decodings in one column will fail indicating that all of the symbols in that codeword should be considered erasures by the row decoder. With 4 redundant row symbols, up to t errors and s NAND Flash chip failures can be corrected in each row as long as 2t + s < 4 which is probably more than sufficient. In other words, with no soft errors, up to 4 NAND Flash chips can fail with no loss of data. With 6 redundant 6-bit column symbols, up to 3 soft errors in 60 can be corrected in each vertical codeword. Some work should be done to determine how many vertical symbols need to be corrected for www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 20 of 22 a specific type of NAND Flash device, but it’s OK for the column decoding to fail every once in a while because the row decoder can correct for multiple column decoding failures. 12.1. 2D RS Encoding for Second Sample Configuration A block diagram of the 2D RS encoder hardware for the second sample configuration is shown in Figure 12. Pause Input OR Input 1 2 3 11 12 13 Parallel RS Encoder 1 2 3 15 16 17 Row Symbol to Column Symbol Converter Row Symbol to Column Symbol Converter Serial RS Encoder Serial RS Encoder Write Interface to Flash Chip Write Interface to Flash Chip Flash Chip Flash Chip Figure 12 2D RS Encoding for Second Sample Configuration Data can be continuously input to the 2D RS encoder as long as the pause input signal is not asserted. Whenever the encoder reaches the end of a data field, it will request the input to pause while it outputs the redundancy for the current codeword. Once the pause signal is deasserted, the input can continue to input data words. All of the pieces of logic shown in Figure 12 have been previously designed and licensed by ECC Tek. 12.2. 2D RS Decoding for Second Sample Configuration A block diagram of the 2D RS decoder for the second configuration is shown in Figure 13. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Flash Chip 1 Flash Chip N Read Interface to Flash Chip Read Interface to Flash Chip RS Column 1 Decoder RS Column N Decoder Fail Fail F I F O State Machine Control F I F O Col Symbol to Row Symbol Converter Fail Page 21 of 22 Col Symbol to Row Symbol Converter PRS Row Decoder Output Figure 13 2D RS Decoding for Second Sample Configuration The FIFOs are needed to store corrected vertical codewords from one Segment because a column decoder cannot know if decoding has failed until the entire codeword has been outputted, but the PRS Row Decoder needs to receive an erasure indication at the same time it receives an input word. The Fail signal from the column decoder is the erasure indication. A column decoding failure means the entire vertical corrected codeword stored in the decoder’s output FIFO has been “erased”. For the sample configuration, the Column Decoder FIFO only needs to store 60 6-bit symbols. www.ecctek.com phil.white@ecctek.com May 13, 2010 NAND Flash Memories For Spacecraft ECC Technologies, Inc. Page 22 of 22 12.3. Percentage Redundancy for Second Sample Configuration Figure 14 shows the second sample configuration has 29.5% redundancy. The percentage redundancy is a function of the level of protection provided. The more protection provided, the higher the percentage redundancy. One Segment 4 bits 13 bits 330 bits Row Redundancy Data 36 bits 366 bits Column Redundancy 17 bits Total number of bits in Segment = 366 x 17 = 6222 Number of Redundant bits in Segment = 1932 Percentage Redundancy = 1932/6222 = .31 = 31% Figure 14 Percentage Redundancy for Second Sample Configuration www.ecctek.com phil.white@ecctek.com May 13, 2010