Multi-Product Floorplan Optimization Framework for Chip Multiprocessors Marco Escalante1, Andrew B. Kahng2, Michael Kishinevsky1, Umit Ogras3 and Kambiz Samadi4 1Intel Corp., 2ECE and CSE, University of California at San Diego 3School of ECEE, Arizona State University, 4Qualcomm Research SLIP 2015 1 Outline • Big Picture and Motivation • Background on Tile-level Floorplanning • Multi-product Chip Floorplanner – Generic Formulation – Choppability constraints for multi-product optimization • Experimental results • Conclusions and future work 2 Big picture • Interconnection networks commonly used in industry – Servers – Ring and mesh – Graphics / Throughput computing – mesh – Clients – Rings • Cyclic dependency between interconnection network and floorplan – Interconnection network depends on tile and chip floorplan – Floorplan depends on interconnection network core core core core Cache Cache Cache Cache core core core core Cache Cache Cache should be wide enough to support link width Cache Cache Both floorplan and interconnect topology affect Power/Performance/Area 3 Current Examples: Chip-Multiprocessors • Last level cache (LLC) • Memory controllers (MC) & channels • I/O controller(s) • QPI controller(s) • Power control unit (PCU), • … PCU Memory Controller Core* C R LLC C LLC • Same resources (building blocks) used for many SKUs 4 C R LLC LLC C C R LLC C MC * Picture of a low core count system is drawn for illustrative purposes. “Core” box entails mid-level caches and other common blocks in all cores Q P I R LLC C R R LLC C R LLC R LLC C C R C MC P C I e Multi-product FP optimization • Different SKUs with varying requirements – Different number of cores, memory channels, I/O agents – …yet share the same building blocks Make the FP choppable to the optimization once and re-use for all QPI 0 QPI 1 PCIe QPI PC U QPI IIO R3CS I R3CS I core Cache Cache core core Cache core Cache Cache core core Cache core Cache Cache core core Cache core Cache Cache core core Cache core Cache Cache core core Cache VMSE 0 MC MC VMSE 1/2 IV Town – 15 cores 5 QPI 2 MC MC VMS E 3 Intel Xeon Server processor Haswell had 27 different SKUs, with number of cores ranging from 4 to 18 Overview of Approach • Goal: Develop an efficient and robust floorplan optimization framework for server products • Involves floorplanning at two levels of hierarchy: – (1) tile-level, ~10-20 resources – (2) chip-level, many tiles (> 20 tiles) • Tile-level FP considers the physical constraints due to interconnect • Chip-level FP addresses choppability constraints by simultaneously optimizing the FP across product classes 6 Tile Floorplanning • Objective: Minimize area • Subject to: – Global routing constraints – NoC link width • Major resources – Core, LLC and MLC caches, Core-MLC interface, LLC/MLC - Ring interface, snoop filter, etc. – Resources can be both hard or soft – Hard blocks can rotate 90° • Approach: Mixed-integer linear programming (MILP) • Since tile level FP is not the focus of the paper, only major distinct properties will be mentioned Reference: S. Sutanthavibul, E. Shragowitz and J. B. Rosen, “An Analytical Approach to Floorplan Design and Optimization”, IEEE Trans. on CAD, 10(6), 1991, pp. 761-769. 7 Constraints Imposed by Chip FP • Routing constraint: Block i and j should not overlap in X and Y directions j j XXX i j CORE j XXX XXX Router • Adjacency constraint: Block i and j should be adjacent XXX i i j i j i 8 Outline • Motivation and Big Picture • Background on Tile-level Floorplanner • Multi-product Chip Floorplanner – Generic Formulation – Choppability constraints for multi-product optimization • Experimental results • Conclusions and future work 9 Chip Level Floorplan Overview • Floorplans of each class can be easily derived through chopping operation • Differences with respect to tile floorplan – Overlap constraints are met by default – Integer linear programming formulation – Simultaneous floorplan optimization across multiple product classes P2 P3 Row 1 MC Core MC MC Core Row 0 Core Core Core Core Core 1 2 0 1 MC Core 0 1 Chopped P1 y Column 0 x 10 2 Chopped 2 Chopped Preliminaries and Notations • We use 1-hot binary variables uij such that – uij = 1 means the cell (i,j) is occupied – uij = 0 means the cell (i,j) is empty (1,0) (1,1) (0,0) (0,1) FP: S0 • We need to extend the definition to multiple floorplans – usij represents the cell (i,j) in FP “s” – Multiple types of cells, Core, Memory Controller (MC), Empty – usij0 means an empty cell at (i,j) in FP “s” – usij1 means a CORE at cell (i,j) in FP “s” – usij2 means an MC at cell (i,j) in FP “s” (1,0) (1,1) (0,0) (0,1) FP: S1 – Our formulation can consider k resource types • Example at the right hand side – u0001 (Core), u0011 (Core) – u0101 (Core), u0112 (MC) (1,1) (0,0) (0,1) FP: S2 Core 11 (1,0) MC Empty Generic Problem Formulation • GOAL: to find {usijk}’s to • Minimize sum of half-perimeter of all products (1,0) (1,1) (0,0) (0,1) FP: S0 (1,0) (1,1) (0,0) (0,1) • Constraints on number of resources FP: S1 – Each tile can be occupied by only one type of resource – Each product has a specified number of instances of each resource • Monotonicity constraints: Suppose, product i can be chopped to j (1,0) (1,1) (0,0) (0,1) FP: S2 MC Core 12 Empty Choppability • Solution = Finding {usijk}’s • Example at right hand sice (1,1) (0,0) (0,1) FP: S0 – {u0000 , u0001 } = {0,1} (Core), {u0010 , u0011 } = {0,1} (Core) – {u0100 , u0101 } = {0,1} (Core), {u0110 , u0111 } = {1,0} (MC) – {u1000 , u1001 } = {0,0} (Empty), {u1010 , u1011 } = {0,1} (Core) – {u1100 , u1101 } = {0,0} (Empty), {u1110 , u1111 } = {1,0} (MC) (1,0) (1,1) (0,0) (0,1) FP: S1 – {u2000 , u2001 } = {0,0} (Empty), {u2010 , u2011 } = {0,1} (Core) – {u2100 , u2101 } = {0,0} (Empty), {u2110 , u2111 } = {0,0} (Empty) Chop the box = Cores are converted to empty Chopping a cell means Core or MC converted to Empty (1,0) (1,1) (0,0) (0,1) FP: S2 Core 13 (1,0) MC Empty Core/MC Count Constraints • Assume – NsCore = Number of cores in FP “s” – NsMC = Number of MCs in FP “s” s s u N ij 1 Core , for s 0,1,2 j s s u N ij 2 MC , for s 0,1,2 i (0,0) (0,1) (1,0) (1,1) (0,0) (0,1) FP: S1 j usij2 = 1 only if there is an MC in the cell (1,0) (1,1) (0,0) (0,1) FP: S2 usij1 = 1 only if there is an Core in the cell Core 14 (1,1) FP: S0 • Example: N0Core = 3, N1Core = 1, N2Core = 1, N0HA = 1, … i (1,0) MC Empty Height and Width Computations • To express area, we need a way of representing height and width, but we will have “s” heights and widths • For each product class i Shows that row r is used i 1, if u rck 1 0 r R 1 1 k K i, c used ci 0, otherwise Shows that column c is used i used r 0 r R 1 i W i w i used c (0,0) (0,1) (1,0) (1,1) (0,0) (0,1) FP: S1 (1,0) (1,1) (0,0) (0,1) FP: S2 0 r R 1 Core 15 (1,1) FP: S0 i 1, if u rck 1 i 0 c C 1 1 k K i, r used r 0, otherwise i H i h (1,0) MC Empty Additional Placement Constraints • Sources at the boundaries – Memory controller channels and I/O controllers • Contiguous tiles • Adjacency constraints MCh MCh MCh MC MC MCh I/O I/O I/O MCh MC MC MCh I/O MCh MCh MCh I/O MCh MCh I/O MCh MCh MCh MCh MCh I/O MCh I/O I/O I/O MCh 16 MCh MCh MCh MCh Power- / Performance-Driven DSE • We allow the number of core and memory controllers for each product to vary in a given range given target design thermal power • We add constraints on maximum number of memory controllers in a given row or column 17 Outline • Motivation and Big Picture • Background on Tile-level floorplanning • Multi-product Chip Floorplanner – Generic Formulation – Choppability constraints for multi-product optimization • Experimental results • Conclusions and future work 18 Developed Infrastructure • Read a floorplan description file • Generate corresponding integer linear programming formulation that is fed into CPLEX • Solutions are written into an ascii file describing final floorplans of all the product classes • The final floorplan description of each product class is printed as a PDF file # <#rows> × <#columns> Biggest product grid size: 6 × 6 N_C_0: 26 N_H_0: 4 N_C_1: 18 N_H_1: 2 # max-k constraint on HAs MC top: 1 MC bottom: 2 MC left: 1 MC right: 1 # Tile width and height information Tile width: 2 Tile height: 1 Multi-Product FP Description File 19 Chopping with Four Product Classes • S0 = 34 cores, 8 MCs S1 = 26 cores, 4 MCs • S2 = 18 cores, 2 MCs S3 = 10 cores, 2 MCs 20 MC MC Core Core Core MC Empty MC Core Core Core Core Core Empty MC Core Core Core Core Core Empty Core Core Core Core Core Core Empty Core Core Core Core Core Core Empty Core Core Core Core Core Core Empty MC Core Core MC Core MC Empty Chopping with Four Product Classes • S1 = 26 cores, 4 MCs S2 = 18 cores, 2 MCs 21 MC Core Core Core Core Core Core MC MC Core Core Core Core Core Core Core MC Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Chopping with Four Product Classes • S2 = 18 cores, 2 MCs S3 = 10 cores, 2 MCs 22 MC Core Core Core Core Core MC Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Results with Memory Controller Channels • S1 = 36 cores, 8 MCs, 8 MChs S2 = 27 cores, 6 MCs, 6 MChs MCH8 E MCH7 MCH6 MCH5 E MCH2 E MC E MC E MC E MC C C C C C C C C C C C C C C MC C C C C C C C C C C C C C C C C C C E MC C MCH3 23 C C E MC C MCH1 MC MCH4 I/O Results with Memory Controller Channels • S2 = 27 cores, 6 MCs, 6 MChs S3 = 18 cores, 4 MCs, 4 MChs MCH2 E MCH1 24 MCH3 MCH5 MCH4 MC E MC MC MC C C C C C C C C C C E C C C C C C C C C C C C C C MC C C C MC E MCH6 I/O Conclusions & Future works • Simultaneous floorplan optimization framework for CMPs across multiple products • We define the concept of a choppable floorplan – Enables us to easily derive the floorplan of smaller products from those of larger • Finding choppable floorplans across multiple products to reduce re-design costs and shortens time-to-market • Future challenges – Joint tile and chip level floorplanning – Reducing the white space when 25 26 Results with Memory Controller Channels • S1 = 36 cores, 8 MCs, 8 MChs S2 = 27 cores, 6 MCs, 6 MChs S3 = 18 cores, 4 MCs, 4 MChs 27 Test Case # Binary Variables # Constraints CPU Runtime (s) 1 595 3014 687 2 896 6204 4744 3 1089 7218 14936 Different Grid Size MC MC MC MC MC MC MC • Grid size is 6 x 6 • Total number of tiles = 30 • Tile height = 1, Tile width = 2 MC 28 • Enables exploration of different tile aspect ratios Power- / Performance-Driven DSE (2) • We consider different width and height values for different resource types 29 Tile Floorplan Examples XXX Logic MISC XXX Pipeline Stages CORE Out. Buffer XXX Router XXX XXX IDI Interface XXX XXX Sample Core Floorplan 30 Sample FP for Router &Cache Controller Developed Infrastructure • Read a floorplan description file • Generate corresponding mixedinteger programming formulation that is fed into CPLEX • Solutions are written into an ascii file describing final floorplan • The final floorplan description is printed as a PDF file at the end # <block name> <area> <minAR> <maxAR> <rotation> BEGIN FP DESCRIPTION X1 A1 minAR1 maxAR1 0 X2 A2 minAR2 maxAR2 1 X3 A3 minAR3 maxAR3 1 X4 A4 minAR4 maxAR4 0 END FP DESCRIPTION # <Block 1> <Block 2> <nonoverlapping constraint> BEGIN OVERLAP CONSTRAINTS X2 X4 3 END OVERLAP CONSTRAINTS #<Block 1> <Block 2> BEGIN ADJACENCY INFO X3 X4 END ADJACENCY INFO Floorplan Description File 31