pptx

advertisement
Multi-Product Floorplan Optimization
Framework for Chip Multiprocessors
Marco Escalante1, Andrew B. Kahng2, Michael Kishinevsky1,
Umit Ogras3 and Kambiz Samadi4
1Intel
Corp., 2ECE and CSE, University of California at San Diego
3School of ECEE, Arizona State University, 4Qualcomm Research
SLIP 2015
1
Outline
• Big Picture and Motivation
• Background on Tile-level Floorplanning
• Multi-product Chip Floorplanner
– Generic Formulation
– Choppability constraints for multi-product optimization
• Experimental results
• Conclusions and future work
2
Big picture
• Interconnection networks commonly used in industry
– Servers – Ring and mesh
– Graphics / Throughput computing – mesh
– Clients – Rings
• Cyclic dependency between interconnection network and floorplan
– Interconnection network depends on tile and chip floorplan
– Floorplan depends on interconnection network
core
core
core
core
Cache
Cache
Cache
Cache
core
core
core
core
Cache
Cache
Cache should be wide enough
to support link width
Cache
Cache
Both floorplan and interconnect topology affect Power/Performance/Area
3
Current Examples: Chip-Multiprocessors
• Last level cache (LLC)
• Memory controllers (MC) &
channels
• I/O controller(s)
• QPI controller(s)
• Power control unit (PCU),
• …
PCU
Memory Controller
Core*
C
R
LLC
C
LLC
• Same resources (building
blocks) used for many SKUs
4
C
R
LLC
LLC
C
C
R
LLC
C
MC
* Picture of a low core count system is drawn for illustrative purposes. “Core”
box entails mid-level caches and other common blocks in all cores
Q
P
I
R
LLC
C
R
R
LLC
C
R
LLC
R
LLC
C
C
R
C
MC
P
C
I
e
Multi-product FP optimization
• Different SKUs with varying requirements
– Different number of cores, memory channels, I/O agents
– …yet share the same building blocks
Make the FP choppable to the optimization once and re-use for all
QPI 0
QPI 1
PCIe
QPI
PC
U
QPI
IIO
R3CS
I
R3CS
I
core
Cache
Cache
core
core
Cache
core
Cache
Cache
core
core
Cache
core
Cache
Cache
core
core
Cache
core
Cache
Cache
core
core
Cache
core
Cache
Cache
core
core
Cache
VMSE 0
MC
MC
VMSE
1/2
IV Town – 15 cores
5
QPI 2
MC
MC
VMS
E
3
Intel Xeon Server processor Haswell
had 27 different SKUs, with number of
cores ranging from 4 to 18
Overview of Approach
• Goal: Develop an efficient and robust floorplan optimization
framework for server products
• Involves floorplanning at two levels of hierarchy:
– (1) tile-level, ~10-20 resources
– (2) chip-level, many tiles (> 20 tiles)
• Tile-level FP considers the physical constraints
due to interconnect
• Chip-level FP addresses choppability constraints by
simultaneously optimizing the FP across product classes
6
Tile Floorplanning
• Objective: Minimize area
• Subject to:
– Global routing constraints
– NoC link width
• Major resources
– Core, LLC and MLC caches, Core-MLC interface, LLC/MLC - Ring
interface, snoop filter, etc.
– Resources can be both hard or soft
– Hard blocks can rotate 90°
• Approach: Mixed-integer linear programming (MILP)
• Since tile level FP is not the focus of the paper, only major
distinct properties will be mentioned
Reference: S. Sutanthavibul, E. Shragowitz and J. B. Rosen, “An Analytical Approach to Floorplan Design and
Optimization”, IEEE Trans. on CAD, 10(6), 1991, pp. 761-769.
7
Constraints Imposed by Chip FP
• Routing constraint: Block i and j should not overlap in X and Y directions
j
j
XXX
i
j
CORE
j
XXX
XXX
Router
• Adjacency constraint:
Block i and j should be adjacent
XXX
i
i
j
i
j
i
8
Outline
• Motivation and Big Picture
• Background on Tile-level Floorplanner
• Multi-product Chip Floorplanner
– Generic Formulation
– Choppability constraints for multi-product optimization
• Experimental results
• Conclusions and future work
9
Chip Level Floorplan Overview
• Floorplans of each class can be easily derived through
chopping operation
• Differences with respect to tile floorplan
– Overlap constraints are met by default
– Integer linear programming formulation
– Simultaneous floorplan optimization across multiple product classes
P2
P3
Row 1
MC
Core
MC
MC
Core
Row 0
Core
Core
Core
Core
Core
1
2
0
1
MC
Core
0
1
Chopped
P1
y
Column 0
x
10
2
Chopped
2
Chopped
Preliminaries and Notations
• We use 1-hot binary variables uij such that
– uij = 1 means the cell (i,j) is occupied
– uij = 0 means the cell (i,j) is empty
(1,0)
(1,1)
(0,0)
(0,1)
FP: S0
• We need to extend the definition to multiple floorplans
– usij represents the cell (i,j) in FP “s”
– Multiple types of cells, Core, Memory Controller (MC), Empty
– usij0 means an empty cell at (i,j) in FP “s”
– usij1 means a CORE at cell (i,j) in FP “s”
– usij2 means an MC at cell (i,j) in FP “s”
(1,0)
(1,1)
(0,0)
(0,1)
FP: S1
– Our formulation can consider k resource types
• Example at the right hand side
– u0001 (Core), u0011 (Core)
– u0101 (Core), u0112 (MC)
(1,1)
(0,0)
(0,1)
FP: S2
Core
11
(1,0)
MC
Empty
Generic Problem Formulation
• GOAL: to find {usijk}’s to
• Minimize sum of half-perimeter of all products
(1,0)
(1,1)
(0,0)
(0,1)
FP: S0
(1,0)
(1,1)
(0,0)
(0,1)
• Constraints on number of resources
FP: S1
– Each tile can be occupied by only one type of resource
– Each product has a specified number of instances of each resource
• Monotonicity constraints: Suppose, product i can be chopped to j
(1,0)
(1,1)
(0,0)
(0,1)
FP: S2
MC
Core
12
Empty
Choppability
• Solution = Finding {usijk}’s
• Example at right hand sice
(1,1)
(0,0)
(0,1)
FP: S0
– {u0000 , u0001 } = {0,1} (Core), {u0010 , u0011 } = {0,1} (Core)
– {u0100 , u0101 } = {0,1} (Core), {u0110 , u0111 } = {1,0} (MC)
– {u1000 , u1001 } = {0,0} (Empty), {u1010 , u1011 } = {0,1} (Core)
– {u1100 , u1101 } = {0,0} (Empty), {u1110 , u1111 } = {1,0} (MC)
(1,0)
(1,1)
(0,0)
(0,1)
FP: S1
– {u2000 , u2001 } = {0,0} (Empty), {u2010 , u2011 } = {0,1} (Core)
– {u2100 , u2101 } = {0,0} (Empty), {u2110 , u2111 } = {0,0} (Empty)
Chop the box = Cores are converted to empty
Chopping a cell means Core or MC converted to
Empty
(1,0)
(1,1)
(0,0)
(0,1)
FP: S2
Core
13
(1,0)
MC
Empty
Core/MC Count Constraints
• Assume
– NsCore = Number of cores in FP “s”
– NsMC = Number of MCs in FP “s”
s
s
u

N
 ij 1 Core , for s  0,1,2
j
s
s
u

N
 ij 2 MC , for s  0,1,2
i
(0,0)
(0,1)
(1,0)
(1,1)
(0,0)
(0,1)
FP: S1
j
usij2 = 1 only if there is an MC in the cell
(1,0)
(1,1)
(0,0)
(0,1)
FP: S2
usij1 = 1 only if there is an Core in the cell
Core
14
(1,1)
FP: S0
• Example: N0Core = 3, N1Core = 1, N2Core = 1, N0HA = 1, …
i
(1,0)
MC
Empty
Height and Width Computations
• To express area, we need a way of representing height
and width, but we will have “s” heights and widths
• For each product class i
Shows that
row r is used
i
1, if 

u

rck  1
0  r  R 1 1  k  K
i, c used ci  

 0, otherwise

Shows that
column c is used
i
used
 r
0  r  R 1
i W i  w 
i
used
 c
(0,0)
(0,1)
(1,0)
(1,1)
(0,0)
(0,1)
FP: S1
(1,0)
(1,1)
(0,0)
(0,1)
FP: S2
0  r  R 1
Core
15
(1,1)
FP: S0
i
1, if   u rck
 1
i
0  c  C 1 1  k  K
i, r used r  

 0, otherwise

i H i  h 
(1,0)
MC
Empty
Additional Placement Constraints
• Sources at the boundaries
– Memory controller channels and I/O controllers
• Contiguous tiles
• Adjacency constraints
MCh
MCh
MCh
MC MC
MCh
I/O
I/O
I/O
MCh
MC MC
MCh
I/O
MCh
MCh
MCh
I/O
MCh
MCh
I/O
MCh
MCh
MCh
MCh
MCh
I/O
MCh
I/O
I/O
I/O
MCh
16
MCh
MCh
MCh
MCh
Power- / Performance-Driven DSE
• We allow the number of core and memory controllers for each
product to vary in a given range given target design thermal power
• We add constraints on maximum number of memory controllers in a
given row or column
17
Outline
• Motivation and Big Picture
• Background on Tile-level floorplanning
• Multi-product Chip Floorplanner
– Generic Formulation
– Choppability constraints for multi-product optimization
• Experimental results
• Conclusions and future work
18
Developed Infrastructure
• Read a floorplan description file
• Generate corresponding integer
linear programming formulation that
is fed into CPLEX
• Solutions are written into an ascii file
describing final floorplans of all the
product classes
• The final floorplan description of each
product class is printed as a PDF file
# <#rows> × <#columns>
Biggest product grid size: 6 × 6
N_C_0: 26
N_H_0: 4
N_C_1: 18
N_H_1: 2
# max-k constraint on HAs
MC top: 1
MC bottom: 2
MC left: 1
MC right: 1
# Tile width and height information
Tile width: 2
Tile height: 1
Multi-Product FP Description File
19
Chopping with Four Product Classes
• S0 = 34 cores, 8 MCs  S1 = 26 cores, 4 MCs
• S2 = 18 cores, 2 MCs  S3 = 10 cores, 2 MCs
20
MC
MC
Core
Core
Core
MC
Empty
MC
Core
Core
Core
Core
Core
Empty
MC
Core
Core
Core
Core
Core
Empty
Core
Core
Core
Core
Core
Core
Empty
Core
Core
Core
Core
Core
Core
Empty
Core
Core
Core
Core
Core
Core
Empty
MC
Core
Core
MC
Core
MC
Empty
Chopping with Four Product Classes
• S1 = 26 cores, 4 MCs  S2 = 18 cores, 2 MCs
21
MC
Core
Core
Core
Core
Core
Core
MC
MC
Core
Core
Core
Core
Core
Core
Core
MC
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Chopping with Four Product Classes
• S2 = 18 cores, 2 MCs  S3 = 10 cores, 2 MCs
22
MC
Core
Core
Core
Core
Core
MC
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Results with Memory Controller Channels
• S1 = 36 cores, 8 MCs, 8 MChs 
S2 = 27 cores, 6 MCs, 6 MChs
MCH8
E
MCH7
MCH6
MCH5
E
MCH2
E
MC
E
MC
E
MC
E
MC
C
C
C
C
C
C
C
C
C
C
C
C
C
C
MC
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
E
MC
C
MCH3
23
C
C
E
MC
C
MCH1
MC
MCH4
I/O
Results with Memory Controller Channels
• S2 = 27 cores, 6 MCs, 6 MChs 
S3 = 18 cores, 4 MCs, 4 MChs
MCH2
E
MCH1
24
MCH3
MCH5
MCH4
MC
E
MC
MC
MC
C
C
C
C
C
C
C
C
C
C
E
C
C
C
C
C
C
C
C
C
C
C
C
C
C
MC
C
C
C
MC
E
MCH6
I/O
Conclusions & Future works
• Simultaneous floorplan optimization framework for CMPs
across multiple products
• We define the concept of a choppable floorplan
– Enables us to easily derive the floorplan of smaller products
from those of larger
• Finding choppable floorplans across multiple products to
reduce re-design costs and shortens time-to-market
• Future challenges
– Joint tile and chip level floorplanning
– Reducing the white space when
25
26
Results with Memory Controller Channels
• S1 = 36 cores, 8 MCs, 8 MChs 
S2 = 27 cores, 6 MCs, 6 MChs 
S3 = 18 cores, 4 MCs, 4 MChs
27
Test Case
# Binary
Variables
# Constraints
CPU
Runtime (s)
1
595
3014
687
2
896
6204
4744
3
1089
7218
14936
Different Grid Size
MC
MC
MC
MC
MC
MC
MC
• Grid size is 6 x 6
• Total number of tiles = 30
• Tile height = 1, Tile width = 2
MC
28
• Enables exploration of different
tile aspect ratios
Power- / Performance-Driven DSE (2)
• We consider different width and height values for different resource types
29
Tile Floorplan Examples
XXX
Logic
MISC
XXX
Pipeline Stages
CORE
Out.
Buffer
XXX
Router
XXX
XXX
IDI
Interface
XXX
XXX
Sample Core Floorplan
30
Sample FP for Router &Cache
Controller
Developed Infrastructure
• Read a floorplan description file
• Generate corresponding mixedinteger programming
formulation that is fed into
CPLEX
• Solutions are written into an
ascii file describing final
floorplan
• The final floorplan description is
printed as a PDF file at the end
# <block name> <area> <minAR> <maxAR> <rotation>
BEGIN FP DESCRIPTION
X1 A1 minAR1 maxAR1 0
X2 A2 minAR2 maxAR2 1
X3 A3 minAR3 maxAR3 1
X4 A4 minAR4 maxAR4 0
END FP DESCRIPTION
# <Block 1> <Block 2> <nonoverlapping constraint>
BEGIN OVERLAP CONSTRAINTS
X2 X4 3
END OVERLAP CONSTRAINTS
#<Block 1> <Block 2>
BEGIN ADJACENCY INFO
X3 X4
END ADJACENCY INFO
Floorplan Description File
31
Download