System-Level Hardware-Based Protection of Memories against Soft-Errors Valentin Gherman Samuel Evain Mickael Cartron Nathaniel Seymour Yannick Bonhomme Laboratoire d' Intégration des Systèmes et des Technologies Motivation New constraints – Increasing design & manufacturing costs – Decreasing time-to-market – Increasing reliability and yield problems of nanometer technologies • Memory systems remain the most vulnerable New requirements – Low-cost solutions • Cross-domain & cross-application platform-based design – Flexible solutions • Power, Performance, Reliability Laboratoire d' Intégration des Systèmes et des Technologies Related EDAC-based memory protection schemes Low Cost [1, 3] – Flexibility – Standard interconnect & memory Processor Core Hardware-based [1, 2] – Concurrent error detection – Transient & permanent faults System-level [3] – Software-based [3] Interconnection [1, 2] Standard Memory Data Word [1] GRLIB IP Core User’s Manual, Version 1.0.19, September 2008, pages 227, 248 [2] R. Mariani, G. Boschi, Solid-State Electronics 49, 2005 [3] P.P. Shirvani, N. Saxena, E.J. McCluskey, Transactions on Reliability, September 2000 EDAC1 … EDACn EDAC1 ... EDACm Laboratoire d' Intégration des Systèmes et des Technologies Reliability Service Manager (RSM) Low Cost [1, 3, RSM] – Flexibility – Standard interconnect & memory Hardware-based [1, 2, RSM] – Concurrent error detection – Transient & permanent faults Processor Core RSM [3] Standard Interconnection [1, 2] System-level [3, RSM] Standard Memory Data Word [1] GRLIB IP Core User’s Manual, Version 1.0.19, September 2008, pages 227, 248 [2] R. Mariani, G. Boschi, Solid-State Electronics 49, 2005 [3] P.P. Shirvani, N. Saxena, E.J. McCluskey, Transactions on Reliability, September 2000 EDAC1 … EDACn EDAC1 ... EDACm Laboratoire d' Intégration des Systèmes et des Technologies RSM: address calculation of EDAC codes @EDAC = OffSet [ (Mask @DW) >> log2n ] @DW % n – @DW = address of a protected data word (DW) – OffSet, Mask are parameters – n = DW width / EDAC code width Standard Memory Data Word (DW) 1 OffSet Data Word (DW) n Check Words EDAC 1 …. EDAC n Laboratoire d' Intégration des Systèmes et des Technologies EDAC code position in a memory word RSM plugged on the bus arbiter (1) Scales well with the number of masters in the system Hide supplementary RSM-memory accesses to the arbiter Processor (Master 1) … Processor (Master n) Bus arbiter RSM Main Memory (Slave) Laboratoire d' Intégration des Systèmes et des Technologies RSM plugged on the bus arbiter (2) Interface RSM: AHB master AHB slave AHB Arbiter MWCTRL AHB Master MADDR MWDATA SWCTRL RSM Check bits MUX2 Data bits ADDR MUX1 WCTRL SADDR AHB Slave SWDATA WDATA Laboratoire d' Intégration des Systèmes et des Technologies (Memory) RSM plugged on the bus arbiter (3) Interface RSM: AHB slave AHB master SRCTRL Master MRDATA Arbiter SRDATA MRCTRL MUX3 AHB AHB RCTRL RDATA AHB Slave RSM SRDATA SRCTRL Laboratoire d' Intégration des Systèmes et des Technologies (Memory ) RSM Implementation 2.5 ns & 4154 NAND2 (130 nm HCMOS9) Clock cycle overhead (MiBench benchmarks) Processor without memory cache SEC-DED Processor with memory cache Parity SEC-DED 30% Parity 10% 25% 8% 20% 6% 15% 4% 10% 2% 5% 0% StringSearch FFT BasicMath 0% StringSearch Laboratoire d' Intégration des Systèmes et des Technologies FFT BasicMath RSM associated to each master on the interconnection sub-system Larger fault coverage of the interconnection sub-system Better system performance Processor (Master 1) … RSM Processor (Master n) RSM Interconnection Main Memory (Slave) Laboratoire d' Intégration des Systèmes et des Technologies RSM as a MMU wrapper Physical address space – Page-level granularity of the protected zones Virtual address space – Number of protection zones equal to the number of integrity levels Protect MMU-generated memory accesses Processor RSM MMU Main Memory Laboratoire d' Intégration des Systèmes et des Technologies RSM with solid-state secondary storage sub-system Transfers protected by RSM – Blocks with protected data from memory to secondary storage Unprotected mode transfers – All blocks from secondary storage to secondary storage – Blocks with unprotected data and checksums from memory to secondary storage Processor DMA RSM RSM Interconnection Secondary Storage Main Memory Laboratoire d' Intégration des Systèmes et des Technologies Conclusions Low cost – Flexible: cross-domain & cross-application – Standard memory, storage & interconnections – Easy integration into the system (IP core) • Small size (same size as an UART) – No modification of application software – Low impact on system performance Yield & Reliability – Permanent & transient faults Programmability & Flexibility – Size, location & integrity levels of protected zones – Programmable • Offset & Mask parameters Laboratoire d' Intégration des Systèmes et des Technologies