A Survey of DDR4 SDRAM Design Improvement Methods 16 January 2014 Edmund Leong ζ’ζη¦ 0260814 NCTU Memory Systems IEE5011 FALL 2013 1 Overview • Introduction - DDR4 Specifications • A Far End Cross Talk Cancellation Method • Driver Design • A Low Jitter DLL Design • Fast Parallel CRC and DBI Calculation Method • Conclusion NCTU Memory Systems IEE5011 FALL 2013 2 DDR4 Specifications (1/3) • P = α CL VDD2 f • DDR → DDR4 • f↑ 8x • VDD↓2.75x • Based on simplified equation, power consumption is still increasing. • Other methods are introduced to reduce power consumption NCTU Memory Systems IEE5011 FALL 2013 3 DDR4 Specifications (2/3) • Change from center tapped termination (CTT)/SSTL to pseudo open drain (POD) • Reduction of VDD to GND path when DQ is logic high. NCTU Memory Systems IEE5011 FALL 2013 4 DDR4 Specifications (3/3) • Data Bus Inversion (DBI) • 2.5V Vpp for word lines • CRC protection • CA parity error detection • Point to point topology NCTU Memory Systems IEE5011 FALL 2013 5 Overview • Introduction - DDR4 Specifications • A Far End Cross Talk Cancellation Method • Driver Design • A Low Jitter DLL Design • Fast Parallel CRC and DBI Calculation Method • Conclusion NCTU Memory Systems IEE5011 FALL 2013 6 Far End Crosstalk Cancellation Method (1/4) • crosstalk cancellation methods: • Circuit implementation • Wider spacing between signal traces • Use Via stub capacitance NCTU Memory Systems IEE5011 FALL 2013 7 Far End Crosstalk Cancellation Method (2/4) • Far end crosstalk can be reduced by using via stubs • Inter Symbol Interference (ISI) is not affected • Resonant frequency is over 10GHz ππΉπΈππ π‘ππππβπ‘ πΆπ πΏπ πππππ (π‘ − π‘ππππβπ‘) π‘ = ( − ) 2 πΆ πΏ ππ‘ NCTU Memory Systems IEE5011 FALL 2013 8 • 8 port s-parameter is measured up to 20GHz with a vector analyzer • Resonance by the stub starts around 15GHz NCTU Memory Systems IEE5011 FALL 2013 9 Far End Crosstalk Cancellation Method (4/4) NCTU Memory Systems IEE5011 FALL 2013 10 Overview • Introduction - DDR4 Specifications • A Far End Cross Talk Cancellation Method • Driver Design • A Low Jitter DLL Design • Fast Parallel CRC and DBI Calculation Method • Conclusion NCTU Memory Systems IEE5011 FALL 2013 11 Driver Design (1/5) • Type 0 – standard termination • Type I – switched termination preemphasis • Type II – constant termination de-emphasis NCTU Memory Systems IEE5011 FALL 2013 12 Driver Design (2/5) • Type 0 – standard termination • R2 is always open • Always driving with RS termination • No boost to high frequency content NCTU Memory Systems IEE5011 FALL 2013 13 Driver Design (3/5) • Type I – switched termination preemphasis • R2 termination is active only during transition bit • Termination during transition is Rs||R2. • Termination during non transition is Rs only. • Level of pre-emphasis is controlled by Rs and R2 NCTU Memory Systems IEE5011 FALL 2013 14 Driver Design (4/5) • Type II – constant termination de-emphasis • R1 and R2 in series which is the Thevenin equivalent to Rs. • Transition bit driven by Rs. • Non Transition bit driven by R1-R2 network NCTU Memory Systems IEE5011 FALL 2013 15 Driver Design (5/5) Simulation of DDR4 2400MT/s, 1 DIMM per channel Driver and Termination Best termination value Rs - Rt Best eye width (ps) Type 0, VDDQT 40 – 60 187 Type I, VDDQT 40 – 60 187 Type II, VDDQT 40 - 120 232 Type II, CTT 40 – 120 228 • With optimized resistor values, difference of VDDQT or CTT termination has minimal effect on the performance of the net NCTU Memory Systems IEE5011 FALL 2013 16 Overview • Introduction - DDR4 Specifications • A Far End Cross Talk Cancellation Method • Driver Design • A Low Jitter DLL Design • Fast Parallel CRC and DBI Calculation Method • Conclusion NCTU Memory Systems IEE5011 FALL 2013 17 Low Jitter DLL Design (1/4) NCTU Memory Systems IEE5011 FALL 2013 18 Low Jitter DLL Design (2/4) • Conventional Charge Pump 1 π πΌπ· = ππ,π πΆππ₯ 2 πΏ NCTU Memory Systems IEE5011 FALL 2013 ππΊπ − πππ» 2 1 + πππ·π 19 Low Jitter DLL Design (3/4) • New Charge Pump design NCTU Memory Systems IEE5011 FALL 2013 20 Low Jitter DLL Design (4/4) TVLSI’10 JSSC’11 Proposed’13 Process (nm) 54 130 90 DRAM Interface GDDR3 DDR DDR4 Supply (V) 1.8 1.2 1.2 DLL Type ADDLL ADDLL ADDLL Frequency (GHz) 1.4 0.11-1.4 1.6 Peak-to-peak jitter (ps) 29 @ 1.4 GHz 15.11 @ 1.4 GHz 12.33 @ 1.6 GHz Power (mW) 29.5 @ 1GHz 74.4 @ 1.4 GHz 33.6 @ 1.6 GHz Area (mm2) 0.11 0.387 0.047 Proposed – Design and Diagnostics of Electronic Circuits & Systems (DDECS), IEEE International Symposium 2013 NCTU Memory Systems IEE5011 FALL 2013 21 Overview • Introduction - DDR4 Specifications • A Far End Cross Talk Cancellation Method • Driver Design • A Low Jitter DLL Design • Fast Parallel CRC and DBI Calculation Method • Conclusion NCTU Memory Systems IEE5011 FALL 2013 22 Fast Parallel CRC and DBI Calculation Method (1/7) • DDR4 introduces CRC ATM-8 HEC • CRC calculation is based on DBI inverted data • DDR4 adds CRC value at the end of data burst NCTU Memory Systems IEE5011 FALL 2013 23 Fast Parallel CRC and DBI Calculation Method (2/7) • CLmin = tCore + Max(0, tCRC – tPrep) + tAlign • tCalc + Flight time 1 + Flight time 2 < 4nCK • In 3.2Gbps DDR4, calculation time constrain is about 1.2ns NCTU Memory Systems IEE5011 FALL 2013 24 Fast Parallel CRC and DBI Calculation Method (3/7) • (a) has internal nodes that do not swing full rail. • Vdd-Vth swing • (c) has internal nodes with full rail swing • Inverter to prevent long chain of transmission gate NCTU Memory Systems IEE5011 FALL 2013 25 Fast Parallel CRC and DBI Calculation Method (4/7) • DBI is activated when more then half of DQ bits are 0. • Each CRC calculation inputs are determined by bit mapping (eg. Gray boxes). • Serial DBI CRC calculations are too inefficient • A parallel method is needed NCTU Memory Systems IEE5011 FALL 2013 26 Fast Parallel CRC and DBI Calculation Method (5/7) • CRC starts with all DBI bits = 0 • For each CRC[i], information needed for post processing CRC+DBI correction: • Inclusion of DBI#[k] in CRC[i] • Oddness of DQ bits associated with burst k and CRC[i] • Actual DBI#[i] • D[k]= self’ * Odd * DBI#[0]’+ self * Even * DBI#[k] + self * Odd CRC_new[i] = CRC[i] xor D[0] xor … … xor D[7] NCTU Memory Systems IEE5011 FALL 2013 27 Fast Parallel CRC and DBI Calculation Method (6/7) • DBI#[k] inputs into third stage of XOR tree Critical path is one tXOR more than XOR tree • Stage Input CRC[i] Empty Slots input Slots 1 64 37 27 2 32 19 13 3 16 10 6 4 8 5 3 5 4 3 1 6 2 2 0 32 6 CRC_new[i] NCTU Memory Systems IEE5011 FALL 2013 28 Fast Parallel CRC and DBI Calculation Method (7/7) NCTU Memory Systems IEE5011 FALL 2013 29 Conclusion • Specifications of DDR4 require very high speeds which places importance on signal integrity • Transmission line theory is important for impedance matching in termination to reduce reflections and cross talks. • Crosstalk can be minimize with closely placed via stubs • Driver design with constant termination de-emphasis can widen eye diagram • A good DLL design is needed to reduce jitter • Parallel CRC + DBI calculations can relax speed constrains NCTU Memory Systems IEE5011 FALL 2013 30 References • E. Desjardins (2012, Sept. 12). JEDEC Announces Publication of DDR4 Standard [Online]. Available: http://www.jedec.org/news/pressreleases/jedec-announces-publication-ddr4-standard • DDR4 SDRAM, JEDEC standard JESD79-4. Sept 2012. • D. Wang (2013, Dec. 3). Why migrate to DDR4? [Online]. Available: http://www.eetimes.com/document.asp?doc_id=1280577 • H. Goto (2010, Aug. 16). Towards Next-Generation 4Gbps DDR4 Memory [Online]. Available: http://pc.watch.impress.co.jp/docs/column/kaigai/20100816_387444.html • C-M Nieh, J. Park, “Far-end Crosstalk Cancellation using Via Stub for DDR4 Memory Channel,” in IEEE 63rd Electronic Components and Technology Conference (ECTC), pp. 2035-2040, 2013. • N. Pham, D. Dreps, R. Mandrekar, N. Na, “Driver Design for DDR4 Memory Subsystems,” in IEEE 19th Electrical Performance of Electronic Packaging and Systems (EPEPS), pp.297-300, 2010. • Y-H. Tu, K-H. Cheng, H-Y. Wei, H-Y. Huang, “A Low Jitter Delay-Locked-Loop Applied for DDR4,” in IEEE 16th Design and Diagnostics of Electronic Circuits and Systems (DEECS), pp. 98-101, 2013. • Hsiang-Hui Chang, Jung-Yu Chang, Chun-Yi Kuo, Shen-Iuan Liu, “A 0.7-2GHz Self-Calibrated Multiphase Delay-Locked Loop” IEEE Journal of Solid-State Circuits, Vol. 41, No. 5, May 2006 • W-J. Yun, H-W. Lee, D. Shin, and S. Kim, “A 3.57 Gbps Low Jitter All Digital DLL with Dual DCC Circuit for GDDR3 DRAM in 54nm CMOS Technology,” in IEEE Trans. On VLSI, 2010. • Y-S. Kim, S-K. Lee, H-J. Park, and J-Y. Sim, “A 110MHz to 1.4GHz locking 40-phase all-digital DLL,” in IEEE Journal of Solid-State Circuits, vol. 46, no. 2, pp. 435-444, Feb 2011. • J. Moon, J. S. Kih, “Fast Parallel CRC & DBI Calculation for High-speed Memories: GDDR5 and DDR4”, in IEEE International Symposium on Circuits and Systems (ISCAS), pp 317-320. 2011. • K. Lin, C. Wu, “A Low-cost Realization of Multiple-input Exclusive-OR gates,” ASIC Conference and Exhibit, Proceedings of the 8th Annual IEEE Ineternational, pp.307-310. Sept 1995. NCTU Memory Systems IEE5011 FALL 2013 31