Scalable Detailed Placement Legalization for Complex Sub-14nm Constraints Kwangsoo Han, Andrew B. Kahng and Hyein Lee {kwhan, abk, hyeinlee}@ucsd.edu http://vlsicad.ucsd.edu/ ECE Department, UC San Diego Outline • Motivation & Previous Work • Problem Formulation • Our Approach • Experimental Setup and Result • Conclusion 2 Motivation • In old technology nodes, once the library cells were correctly designed, design rule violations (DRVs) could not occur during placement • Limitations of patterning resolution lead to complex design rules for front-end-of-line (FEOL) layers • Placing several ‘legal’ standard cells next to each other may cause violations of FEOL layer rules Final detailed cell placement phase is needed to maintain placement legality with respect to new N10 FEOL rules 3 Cell Layout in N10 Node • The FEOL layers which affect legal placement include implant layer, oxide diffusion layer and poly • Implant layers decide the threshold (Vt) of transistors • Oxide diffusion (OD) defines the active region of transistors • Dummy poly gates are inserted at the (vertical) standard cell boundaries to avoid edge device variability A Y Fin Poly Oxide diffusion (OD) M1 Middle of line M2 Power/ground Cell boundary, implant region 4 (1) Minimum implant width (IW) • Limitation of the current optical lithography technology New design rule (i.e., minimum implant width) • Two same-Vt cells are misaligned vertically A narrow, “staircase” implant layer shape Inter-row IW (IW1) violation • A narrow cell is surrounded by different-Vt cells Intra-row IW (IW2) violation IW2 HVT HVT HVT LVT LVT IW1 HVT HVT 5 (2) Minimum OD jog length (OW) • Cells can have different oxide diffusion (OD) region heights • Lithographic corner rounding minimum OD jog length rule • Cells with different OD heights abutment Cause OD jog length violation OD OD jog Cell boundary 6 (3) Drain-drain abutment (DDA) • • • • Dummy poly gates create extra dummy transistors Dummy transistors can induce leakage power Dummy transistors must be tied off to power/ground rails Two drain nodes are abutted • Extra dummy poly gate tied up with power/ground rails • Cell flipping/displacement Drain-drain abutment D D D D D S D D S D √ S S 7 Previous Works • Dynamic programming-based approaches • Optimal interleaving for intra-row optimization [Hur and Lillis, ICCAD00] • Row-based placement [Kahng et al., ASPDAC99, GLSVLSI04] • Integer Linear Programming (ILP)-based approaches • Placement by branch-and-price [Ramachandaran et al., ASPDAC05] • MIP-based detailed placement [Li and Koh, ISPD12] • DDA-aware placement • [Du and Wong, DATE14] propose a graph model with shortest-path algorithm • Use cell flipping and adjacent-cell swapping • No consideration of inter-row constraints (e.g., IW constraint) Our work: MILP-based optimization to provide the comprehensive support of N10-relevent FEOL rules 8 Our Contributions • Develop a mixed integer linear programming (MILP)based placer, called DFPlacer • Address new DRVs caused by complex N10 FEOL rules • Propose a scalable partitioning-based optimization method • Incorporate our flow into a commercial tool-based placement and routing (P&R) flow for evaluation • Provide insight into timing and area impacts of the dummy poly gate library cell strategy • Standard cells with dummy poly gates (DDA and OW violation free) • Standard cells without dummy poly gates 9 Outline • Motivation & Previous Work • Problem Formulation • Our Approach • Experimental Setup and Result • Conclusion 10 Detailed Placement Problem Formulation • Input: Placement with design rule violations • Objective: Legal placement with minimum cell displacements • Subject to: • Minimum implant width (IW) constraint • Minimum oxide diffusion jog length (OW) constraint • Drain-drain abutment (DDA) constraint IW HVT OW HVT HVT LVT LVT OD HVT DDA D D HVT Cell boundary 11 Outline • Motivation & Previous Work • Problem Formulation • Our Approach • Experimental Setup and Results • Conclusion 12 Mixed-ILP Model [Li12] • Single-cell-placement binary variable λck • Placement state k (location and orientation) of cell c • Site occupation variable scrqk • Represent if site (r,q) is occupied by cell c with placement state k λc1 = {0, 1, 0} sc111 =1 sc211 =1 sc311 =0 (1,7) (0, 0) λc2 = {4, 0, 1} λck = {xc, yc, fc}, where xc(yc) is x(y) location of cell c fc is an indicator whether c is flipped [Li12] S. Li and C.-K. Koh, “Mixed Integer Programming Models for Detailed Placement”, Proc. ISPD, 2012, pp. 87-94. 13 Placement Problem Formulation Objective min ( 𝑥𝑐 − 𝑥𝑐,𝑖𝑛𝑖𝑡 + 𝑦𝑐 − 𝑦𝑐,𝑖𝑛𝑖𝑡 𝑐𝑒𝑙𝑙𝑠 𝑐 ⇒ Minimize ) displacements For each cell c 𝑠𝑡𝑎𝑡𝑒𝑠 𝑘 𝜆𝑘𝑐 = 1 ⇒ Select one placement state per cell Orientation, x/y location, site occupation are determined by λck 𝑓𝑐 = 𝑠𝑡𝑎𝑡𝑒𝑠 𝑘 𝑓𝑐𝑘 𝜆𝑘𝑐 𝑥𝑐 = 𝑠𝑡𝑎𝑡𝑒𝑠 𝑘 𝑥𝑐𝑘 𝜆𝑘𝑐 𝑦𝑐 = 𝑠𝑡𝑎𝑡𝑒𝑠 𝑘 𝑦𝑐𝑘 𝜆𝑘𝑐 𝑠𝑐𝑟𝑞 = 𝑠𝑡𝑎𝑡𝑒𝑠 𝑘 𝑘 𝑠𝑐𝑟𝑞 𝜆𝑘𝑐 Placement constraints 𝑐𝑒𝑙𝑙𝑠 𝑐 𝑠𝑐𝑟𝑞 ≤ 1 No overlap + more constraints to support IW, OW and DDA 14 IW Constraints Formulation • New: 𝑣𝑟𝑞 , a binary vector indicating Vt of the site (r,q) • Vt boundaries are checked with inter-/intra-row variables Vt boundary Vt boundary |W| = 3 |W| = 3 At the Vt boundary, at least |W| consecutive sites must be same Vt At the Vt boundary where two vertically neighboring sites are same Vt, the Vt must be kept for at least |W| sites in the both upper and lower rows 15 OW and DDA Constraints Formulation • Pre-characterize all adjacency conditions which violate OW and/or DDA for each cell pair • Add mutual exclusion constraints • λc1i and λc2 j is forbidden pair λc1i + λc2 j ≤ 1 λc1i λc2 j 16 Distributable Global Optimization • Limitation of MILP-based approach ⇒ Runtime • Distributable optimization of many windows of cells • Split the post-route layout into small clips • Run optimization for each clip with fixed boundaries • Cells on boundaries are handled by shifting windows 1st iteration 2nd iteration Fixed cells clip Layout 17 Overall Flow Routed layout w/ DRVs DFPlacer Global optimization Local optimization Make new windows Removed overlapping windows Solve multiple windows in parallel Shift partitioning lines Optimization for each window Complex constraints for N10 Remaining DRVs ILP formulation - DDA, OW, IW ILP solver (CPLEX) Solve multiple windows in parallel #DRVs < δ ? N Y Cell location solution ECO Routing Routed layout with #DRVs < δ 18 Outline • Motivation & Previous Work • Problem Formulation • Our Approach • Experimental Setup and Results • Conclusion 19 Experimental Setup • SP&R tools: Synopsys Design Compiler H-2013.03-SP3 and Cadence Encounter Digital Implementation System XL 13.1 • Technology: two kinds of 7nm dual Vt libraries • 62 standard cells without dummy poly gates (CWOD) • 62 standard cells with dummy poly gates (CWD) • Design: AES, JPEG [OpenCores], ARM Cortex M0, ARM Cortex M0 x 3 • *_d – implemented with CWD library Fin • *_nd – implemented with CWOD library [OpenCores] http://opencores.com/ Design M0_nd AES_nd M0x3_nd Y 8260 12147 A 27248 LVT (%) 52 54 Util. (%) 77 78 #Inst A JPEG_nd Y 47948 8238 56 51 51 80 77 77 WL Inverter cell layout 114685 142294Inverter 392540cell layout 694624 (um) in CWD library in CWOD library Area (um2) 7668 8894 M0_d 24463 49629 Poly AES_ddiffusion M0x3_d(OD) JPEG_d Oxide M1 12491 26690 48317 Middle of line 54 55 52 M2 Power/ground Cell80boundary, 79 implant 77 region 116866 150632 409579 764738 8668 10596 27400 55824 20 Experimental Results (1) • Report ∆wirelength and ∆worst setup slack • Up to 3.42% wirelength increase • *_nd cases shows similar or slightly larger ∆WL% than *_d • ∆WSS ranges from -19ps to 68ps • Positive ∆WSS there is room to improve timing ∆ WL (%) 4.00% 3.50% 3.00% 2.50% 2.00% 1.50% 1.00% 0.50% 0.00% -0.50% -1.00% ∆ WL (%) ∆ WSS (ps) 80 ∆ WSS (ps) 60 40 20 0 -20 -40 21 Experimental Results (2) Remaining violations (%) • Global optimization fixes ~90% of DRVs • Runtime of global optimization using CWOD library are 1.8x larger than those using CWD library (except for Cortex M0) • The runtime of the global optimization phase can be further reduced with more computing resource 100% m0_nd jpeg_nd m0x3_d 90% 80% 70% 60% aes_nd m0_d jpeg_d m0x3_nd aes_d Global optimization (3rd iteration); 1.8x90% violations are fixed 50% 40% 30% 20% 10% 0% 0 500 1000 1500 2000 2500 3000 Runtime (sec) 22 Experimental Results (3) • DFPlacer fixes 99% of design rule violations Design M0_nd AES_nd M0x3_nd JPEG_nd M0_d AES_d M0x3_d JPEG_d Init. IW #vio. 926 1771 3514 4056 988 1566 2810 6296 Init. DDA/OW #vio. 1611 1900 4230 12024 0 0 0 0 Final total #vio. 25 34 65 164 10 11 27 43 • Example solution DDA violation IW violation flipped Cells are moved OW violation IW violation 23 Outline • Motivation & Previous Work • Problem Formulation • Our Approach • Experimental Setup and Results • Conclusion 24 Conclusion and Future Work • Propose a scalable detailed placement legalization flow for complex FEOL constraints arising at the foundry 10nm node • Constraints include minimum implant width, minimum OD jog rules and drain-drain abutment • Fixes 99% of DRVs with 3% increase in wirelength and minimal impact on timing • Future work • Timing and wirelength-driven placement legalization • “Smart ECO” method for few remaining DRVs after global placement legalization 25 Thank you! Experimental Setup: Designs and Technologies • Minimum OD jog length = 4 sites width • Minimum implant width = 4 sites width • Number of violations of cell pair • Minimum implant width rule violation: 7172 out of 15376 (= 62 x 62 x 2 x 2) • Minimum OD jog length rule violation: 280 out of 15376 • 7nm cell library with scaled 28nm BEOL (back-end-of-line) LEF • Site width/height: 0.136/0.9 um min M1 pitch of 28nm node min M2 pitch of 28nm node Scale by 2.5x A1 A0 B0 B1 Y OAI22 in 7nm node 27 Scaling of 7nm Cells • Scale 7nm cells by 2.5X • Left figure is the scaled OAI22_X1 • All the pins are on track with 0.135um M1 vertical pitch • However, encounter does not work with 0.135um M1 vertical pitch • Right figure shows the modified OAI22_X1 (fit into 0.136um M1 vertical pitch) • Increase width from 0.81um 0.816um ( = 0.81 + (0.81/135)) • Shift the pins to be aligned to the vertical track with 0.136um pitch 0.067 0.135 0.135 0.135 0.135 0.135 0.068 0.068 0.136 0.136 0.136 0.1 0.136 0.136 0.068 0.1 0.9 0.05 A1 0.9 0.05 A0 B0 B1 Y 0.81 0.816 28