Algorithmic Techniques in VLSI CAD Shantanu Dutt University of Illinois at Chicago

advertisement
Algorithmic Techniques in
VLSI CAD
Shantanu Dutt
University of Illinois at Chicago
Common Algorithmic Approaches in
VLSI CAD
• Divide & Conquer (D&C) [e.g., merge-sort,
partition-driven placement, tech.mapping of
fanout-free ckt for dynamic power min.]
• Reduce & Conquer (R&C) [e.g., multilevel
techniques such as the hMetis partitioner]
• Dynamic programming [e.g., matrix
multiplication, optimal buffer insertion]
• Mathematical programming: linear, quadratic,
0/1 integer programming [e.g., floorplanning,
global placement]
Common Algorithmic Approaches in VLSI CAD
(contd)
• Search Methods:
– Depth-first search (DFS): mainly used to find any
solution when cost is not an issue [e.g., FPGA
detailed routing---cost generally determined at the
global routing phase]
– Breadth-first search (BFS): mainly used to find a soln
at min. distance from root of search tree [e.g., maze
routing when cost = dist. from root]
– Best-first search (BeFS): used to find optimal or
provably sub-optimal (at most a certain given factor of
optimal) solutions w/ any cost function, Can be done
when a provable lower-bound of the cost can be
determined for each branching choice from the
“current partial soln node” [e.g., TSP, global routing]
• Iterative Improvement: deterministic, stochastic
• Min-cost network flow
Divide & Conquer
• Determine if the problem can be solved in a hierarchical or divide-&conquer (D&C) manner:
Example from CAD:
Min-total-sw-prob. (or
min-dynamic power)
tech. mapping of a
fanout-free circuit.
Root problem A
Subprob. A1
A1,1
A1,2
Stitch-up of solns to A1 and A2
to form the complete soln to A
Subprob. A2
A2,1
Do recursively until
subprob-size is s.t. an
exhaustive based
optimal design is doable
A2,2
– D&C approach: See if the problem can be “broken up” into 2 or more smaller
subproblems that can be “stitched-up” to give a soln. to the parent prob.
– Do this recrusively for each large subprob until subprobs are small enough for an “easy”
solution technique (could be exhasutive!)
– If the subprobs are of a similar kind to the root prob then the breakup and stitching will
also be similar
–The final design may or may not be optimal (will be optimal if the problem has the
dynamic programming property; see later)
Reduce-&-Conquer
• Examples: Multilevel graph/hypergraph
partitioning (e.g., hMetis), multilevel
routing
Reduce problem size
(Coarsening)
Solve
Uncoarsen and
refine solution
Dynamic Programming (DP)
Root
Problem
A
Stitch-up
function
A1
A2
Stitch-up function f:
Optimal soln of root =
f(optimal solns of subproblems)
= f(opt(A1), opt(A2), opt(A3), opt(A4))
A3
A4
Subproblems
Reuse of subproblem soln.
• The above primary property of DPs (optimal substructure: optimal solns. of subproblems is part of optimal soln. of parent problem) also means that everytime we
optimally solve the subproblem, we can store/record the soln and reuse it
everytime it is part of the formulation of a higher-level problem. The ocurrence of
a subproblem multiple times in different higher-level problems is called the
overlapping subproblem property. It is, however, not a necessary feature of a DP
problem.
Dynamic Programming (contd.)
DP_TM(C)
• A positive example: Total wire minimization in tech. mapping in a fanout-free circuit = DP_Min(C): C is a
fanout-free ckt w/, say, z as its output. The problem is to minimize the sum of the number of outputs (each o/p
contributes to an “exposed” wire in the circuit that needs to be routed), i.e., the sum of wires at the o/ps of
TM’ed gates.
• For a cut Ci w/ z at its o/p that can be TM’ed to a gate gi in
the library, let x, y be 2 i/ps. Then the problem of
z
minimizing the # of o/p wires in T(x) and T(y) are clearly
Ci
independent problems, and the optimal soln. to each is part
of the optimal soln. to DP_Min(C, z) given the cut Ci.
• So the overall optimal formulation is to take the minimum
soln. over all feasible cuts Ci w/ z at their o/p.
• DP_TM(C) = Minall feasible Ci at z = o/p at C S xj in X(Ci) DP_TM(T(xj)),
x
where X(Ci) is the set of i/ps generated by Ci.
y
• Whichever is the min. soln. producing cut Ck, the optimal
solns. to the subproblems at its i/ps is part of the otimal
soln. for DP_TM(C).
• Thus, since the optimal substructure property holds, this
problem is amenable to dynamic programming.
DP_TM(T(x))
DP_TM(T(x))
Dynamic Programming (contd)
Root
Problem
A
Stitch-up
function
A1
A2
A3
A4
Subproblems
• Matrix multiplication example: Most computationally efficient way to perform the series of
matrix mults: M = M1 x M2 x ………….. x Mn, Mi is of size ri x ci w/ ri = ci-1 for i > 1.
• DP formulation: opt_seq(M) = (by defn) opt_seq(M(1,n))
= mini=1 to n-1 {opt_seq(M(1, i)) + opt_seq(M(i+1, n)) + r1xcixcn}
• Correctness rests on the property that the optimal way of multiplying M1x … x Mi
& Mi+1 to Mn will be used in the “min” stitch-up function to determine the optimal soln for M
• Thus if the optimal soln invloves a “cut” at Mr, then the opt_seq(M(1,r)) & opt_seq(M(r+1,n))
will be part of opt_seq(M)
• Perform computation bottom-up (smallest sequences first)
• Complexity: Note that each subseq M(j, k) will appear in the above computation and is solved
exactly once (irrespective of how many times it appears).
• Time to solve M(j, k), j < n, k >= j, not counting the time to solve its subproblems (which
are accounted for in the complexity of each M(j,k)) is (length l of seq) -1 = l-1 (since min
of l-1 different options is computed), where l = j-k+1
• # of different M(j, k)’s is of length l = n – l + 1, 2 <= l <= n.
• Total complexity = Sum l = 2 to n (l-1) (n-l+1) = Q(n3) (as opposed to, say, O(2 n) using
exhaustive search)
DP in VLSI CAD
•
•
•
•
•
•
Example for the simple problem of only an optimization objective: Min-wire cost
tech. mapping of a fanout-free circuit, where the cost is # of wires. Thus best cost
of a subproblem is easy to define and is a single value
However, in CAD, the problems are generally multi-parameter ones: one opt.
objective (min. or max.) and several upper-bound or lower-bound constraints on
several metrics/parameters
Which solution of a subproblem (i.e., a partial solution) is best is now harder to
determine among several at a particular node of the DP tree or dag (directed
acyclic graph)?
Concept of domination is now important: A partial solution X represented by a
vector of opt. and constraint metrics (a1, a2, …, ak) that is not worse in all metrics
than any other partial soln. (i.e., X is not dominated by any other partial soln. of
the same subproblem) is “best”.
So there are multiple “best” solutions of a subproblem, one or more of which can
be part of the optimal/best solution(s) of the parent problem. So after solving a
subproblem, we will get multiple solutions (partial sols. of the parent problem),
and we need to keep the non-dominated ones only and combine them w/ nondominated solns of sibling subproblems to determine solns. to the parent
problem.
Note that we need to get rid of all dominated partial solns. as they are guaranteed
not to lead to the optimal soln. of the full problem or more locally to nondominated/best solns. of the parent problem.
A DP Example: Simple Buffer Insertion
Problem
Given: Source and sink locations, sink capacitances
and RATs (reqd. arrival time), a buffer type, source
delay rules, unit wire resistance and capacitance
Buffer
RAT3
RAT4
s0
RAT2
RAT1
Courtesy: Chuck Alpert, IBM
Simple Buffer Insertion Problem (contd)
Find: Buffer locations and a routing tree such that slack (i.e.,
RAT) at the source is maximized—this gives greatest flexibility
at the source in various ways: getting +ve RATs at fanin gates
w/ fewer buffers at fanin nets, thus indirectly optimizing some
other metrics, e.g., total leakage power or total cell/gate area.
RATq(s0 )  min 1i 4{RAT ( si )  delay ( s0 , si )}
RAT3
RAT4
s0
RAT2
RAT1
Courtesy: Chuck Alpert, IBM
Possible buffer insertion points [nodes]—at and
below branch nodes, and intermediate points on
a long branchless interconnect
Slack/RAT Example
RAT = 500
delay = 400
Slack/RAT = -200
Unsynthesizable!
RAT = 400
delay = 600
RAT = 500
delay = 350
Slack/RAT = +100
RAT = 400
delay = 300
Courtesy: Chuck Alpert, IBM
Elmore Delay
A
R1
R2
B
C1
C
C2
Delay ( A  C)  R1 (C1  C2 )  R2C2
Courtesy: Chuck Alpert, IBM
(= Delay(AB) + Delay(BC)—sum of delays of
“branch-less” segments on path from AC).
Delay of a branchless seg:
Delay(AB) = res(AB)*total cap seen by this res.) +
wire delay (RwCw/2), Rw (Cw) = wire res. (cap) [wire
delay ignored above]
DP Example: Van Ginneken Buffer Insertion Algorithm [ISCAS’90]
• Associate each leaf node/sink with two metrics (Ct, Tt). Ct (cap seen) is useful as upstream
delay is dependent on Ct (how dependent will be based on usptream res. that is not
known at this point—dependent on buffer insertion or not options taken later), and this
upstream RAT dependent on both Ct and Tt.
• Downstream loading capacitance (Ct) and RAT (Tt). Want to min. Ct and max. Tt
• DP-based algo propagates potential solutions bottom-up [Van Ginneken, 90]. At each
intermediate node t (a branch node or an artificial node on a long branch/interconnect),
for each downstream soln. (Cn, Tn) do:
a) Add a wire: Ct  Cn  Cw Note: Ln below is the same as Cn
1
Tt  Tn  Rw  Ln  Rw  Cw
2
b) Subsequently add a buffer:
Ct  Cb
Ct, Tt
Cw, Rw
Cn, Tn
c) Consider both buffer and
Tt  Tn  Tb  Rb  Ln - RwCw/2
no-buffer (i.e., wire-only) solns.
Cn, Tn
Ct, Tt
among the set of solns. at t, if no domination betw. them.
d) If t is a branch node, merge 2 every pair of sub-solutions at each sub-tree: For each
Zn=(Cn,Tn), Zm=(Cm,Tm) soln. vectors in the 2 subtrees, create a soln vector Zt=(Ct,Tt)
where:
(note that wire-only/buffer
Ct  Cn  Cm
Ct, Tt
options just after this node
willbe considered after merging) Tt  min(Tn , Tm )
Cn, Tn
Courtesy: UCLA
Cm, Tm
DP Example: Van Ginneken (contd)
d) (contd.) After merging:
i.
ii.
Add a wire to each merged solution Zt (same cap. & delay change formulation as before)
Add a buffer to each Zt as before
e) Delete all dominated solutions at t: Zt1=(Ct1, Tt1) is dominated if there exists a Zt2=(Ct2,
Tt2) s.t. Ct1 >= Ct2 and Tt1 <= Tt2 (i.e., both metrics are worse)
f) The remaining soln vectors are all “optimal”/“best” solns at t, and one of them will be
part of the optimal solution at the root/driver of the net---this is the DP feature of this
algorithm
RAT3
RAT4
s0
RAT2
RAT1
DP Structure for Van Ginneken’s Algorithm
Looking at the process top-down we get the classical top-down DP-D&C structure:
Root problem  only at the root (S0), select the
solution w/ best value of the opt. metric (highest
RAT in this case), among all non-dom. solutions.
Generate all non-dominated and feasible
solutions across all possible buffer locations
on the interconnect between S0 and nearest
branch point across all “best” (non-dom.)
solns at 1st branch point.
1st (nearest) branch point (w/ a branching factor of
k >= 2) in the net routing from driver S0. Merge the
“best” solutions of the subtrees, delete dominated
and infeasible merged solutions. Rest are the set
of best (non-dom.) solutions at the branch point.
Legend:
Top-down
problem D&C
Bottom-up subsoln. generation
and merging
Subtree 1
Subtree 2
Subtree k
Solve the same problem for each routed subtree starting
from a point in the subtree just after the branch point,
and retain all “best” or non-dominated solutions.
Fig. 1: DP w/ Top-down D&C and bottom-up
solution generation
RAT3
RAT1
Fig. 3: D&C
tree for net RAT2
route in Fig. 2
RAT3
s0
RAT4
1st (nearest) branch point
RAT2
RAT4
RAT1 Fig. 2: Net Route
Van Ginneken Example
Intermediate nodes for possible
buffer location
(20,400)
Buffer
C=5, d=30
Wire
C=10,d=150
(30,250)
(5, 220)
Buffer
C=5, d=50
C=5, d=30
(45, 50)
(5, 0)
(20,100)
(5, 70)
Courtesy: Chuck Alpert, IBM
(20,400)
Wire
C=15,d=200 (for 1st subsoln)
C=15,d=120 (for 2nd subsoln)
(30,250)
(5, 220)
(20,400)
Van Ginneken Example Cont’d
(45, 50)
(5, 0)
(20,100)
(5, 70)
(30,250)
(5, 220)
(20,400)
(5,0) is inferior to (5,70). (45,50) is inferior to (20,100)
Wire C=10, d=90 (for 1st soln.)
Wire C=10, d=80 (for 2nd soln.)
(30,10)
(15, -10)
(20,100)
(5, 70)
(30,250)
(5, 220)
(20,400)
Pick solution with largest slack (max RAT), follow arrows forward
to get final complete solution
Courtesy: Chuck Alpert, IBM
Mathematical Programming
Others
Linear programming (LP)
E.g., Obj: Min 2x1-x2+x3
w/ constraints
x1+x2 <= a, x1-x3 <= b
-- solvable in polynomial time
Quadratic programming (QP)
E.g., Min. x12 – x2x3
w/ linear constraints
-- solvable in polynomial
(cubic) time w/ equality constraints
Some vars
are integers
Mixed integer linear prog (ILP)
Mixed integer quad. prog (IQP)
-- NP-hard
Some vars -- NP-hard
Mixed 0/1 integer linear prog
(0/1 ILP)
-- NP-hard
are in {0,1}
Mixed 0/1 integer quad. prog
(0/1 IQP)
-- NP-hard
0/1 ILP/QLP Examples
• Generally useful for “assignment” problems, where objects {O1, ...,
On) are to be assigned (possibly exclusively) to bins {B1, ..., Bm}
• 0/1 variable xi,j = 1 of object Oi is assigned to bin Bj
• Min-cut bi-partitioning for graphs G(V,E) can me modeled as a 0/1
IQP
IQP modeling of min-cut part.:
➢ x
i,1 = 1 => ui in V1 else ui in V2
(2nd var. xi,2 not needed due to
mutual exclusivity & implication
by xi,1).
ui
uj
➢ Edge (ui, uj) in cutset if:
xi,1 (1-xj,1) + (1-xi,1)(xj,1 ) = 1
➢ Objective function:
V2
V1
Min Sum (ui, uj) in E c(i,j) (xi,1 (1-xj,1) + (1-xi,1)(xj,1)
Constraint: Sum w(ui) xi,1 <= maxsize
➢
Example 2 for ILP/IQP: HLS Resource
Constraint Scheduling
• Constrained scheduling
– General case NP-complete
– Minimize latency given constraints on area or
the resources (ML-RCS)
– Minimize resources subject to bound on latency (MRLCS)
• Exact solution methods
– ILP: Integer Linear Programming
– Hu’s heuristic algorithm for identical processors/ALUs
• Heuristics
– List scheduling
– Force-directed scheduling
21
EE 5301 - VLSI Design
Automation I
ILP Formulation of ML-RCS
• Use binary decision variables
– i = 0, 1, ..., n
– l = 1, 2, ..., l’+1
l’ given upper-bound on latency
– xil = 1 if operation i starts at step l, 0 otherwise.
• Set of linear inequalities (constraints),
and an objective function (min latency)
• Observations
– x  0 for l  t S and l  t L
il
i
i
( t iS  ASAP ( v i ), t iL  ALAP ( v i ))
–
–
t i   l . xil
ti = start time of op i.
l l
is op vi (still) executing at step l?
?
x

1

 im
m  l  d i 1
22
EE 5301 - VLSI Design
Automation I
[Mic94] p.198
Start Time vs. Execution Time
• For each operation vi , only one start time
• If di=1, then the following questions are the same:
– Does operation vi start at step l?
– Is operation vi running at step l?
• But if di>1, then the two questions should be formulated
as:
– Does operation vi start at step l?
• Does xil = 1 hold?
– Is operation vi running at step l?l
• Does the following hold?
xim  1
?
m  l  d i 1

23
EE 5301 - VLSI Design
Automation I
Operation vi Still Running at Step l ?
• Is v9 running at step 6?
– Is x9,6 + x9,5 + x9,4 = 1 ?
4
5
6
v9
x9,6=1
4
5
6
v9
x9,5=1
4
5
6
v9
x9,4=1
• Note:
– Only one (if any) of the above three cases can happen
– To meet resource constraints, we have to ask the
same question for ALL steps, and ALL operations of
that type
24
EE 5301 - VLSI Design
Automation I
Operation vi Still Running at Step l ?
• Is vi running at step l ?
– Is xi,l + xi,l-1 + ... + xi,l-di+1 = 1 ?
l-di+1
l-di+1
l-di+1
...
...
...
l-1
l
...
l-1
vi
xi,l=1
l
vi
xi,l-1=1
25
vi
l-1
l
xi,l-di+1=1
EE 5301 - VLSI Design
Automation I
ILP Formulation of ML-RCS (cont.)
• Constraints:
– Exactly one start time per operation i:
For each i, S xi,l = 1, l in [tiS, tiL]
– Sequencing (dependency) relations must be satisfied
ti  t j  d j  (v j , vi )  E 


il
l
– Resource constraints
l
 l.x   l.x
jl
dj
l
x im  a k , k  1,  , n res , l  1,  , l  1
i :T ( v i )  k m  l  d i 1
• Objective: min
l.x
nl
l
26
EE 5301 - VLSI Design
Automation I
ILP Example
• Assume l = 4
• First, perform ASAP and ALAP
– (we can write the ILP without ASAP and ALAP, but
using ASAP and ALAP will simplify the inequalities)
NOP
1
 v1
2

v3
3
-
v4
4
 v2
-
NOP
 v6
 v8
+ v10 1
 v7
+ v9
v1

< v11 2

v3
 v6
3
-
v4
 v7  v8
v5
4
vn
NOP

v2
-
v5
+ v9
+ v10
<
NOPvn
27
EE 5301 - VLSI Design
Automation I
v11
ILP Example: Unique Start
Times Constraint
• Without using ASAP and
ALAP values:
• Using ASAP and ALAP:
x1,1  1
x2,1  1
x1,1  x1, 2  x1, 3  x1, 4  1
x3, 2  1
x2 ,1  x2 , 2  x2 , 3  x2 , 4  1
x4,3  1
x5, 4  1
...
...
x6,1  x6, 2  1
...
x11,1  x11, 2  x11, 3  x11, 4  1
28
x7 , 2  x7 ,3  1
x8,1  x8, 2  x8,3  1
x9, 2  x9,3  x9, 4  1
....
EE 5301 - VLSI Design
Automation I
ILP Example: Dependency
Constraints
• Using ASAP and ALAP, the non-trivial
inequalities are: (assuming unit delay for +
and *)
2. x  3. x  x  2. x  1  0
7,2
7 ,3
6 ,1
6, 2
2.x9 , 2  3.x9 , 3  4.x9 , 4  x8 ,1  2.x8 , 2  3.x8 , 3  1  0
2. x11, 2  3.x11, 3  4.x11, 4  x10,1  2.x10, 2  3.x10, 3  1  0
4.x5, 4  2.x7 , 2  3.x7 , 3  1  0
5.xn , 5  2. x9 , 2  3.x9 , 3  4.x9 , 4  1  0
5.xn , 5  2.x11, 2  3.x11, 3  4.x11, 4  1  0
29
EE 5301 - VLSI Design
Automation I
ILP Example: Resource
Constraints
• Resource constraints (assuming 2 adders and 2 multipliers)
x1,1  x2,1  x6,1  x8,1  2
x3, 2  x6 , 2  x7 , 2  x8, 2  2
x7 , 3  x8,3  2
x10,1  2
x9, 2  x10, 2  x11, 2  2
x4, 3  x9 ,3  x10,3  x11,3  2
x5, 4  x9, 4  x11, 4  2
• Objective:
– Since l=4 and sink has no mobility, any feasible solution is
optimum, but we can use the following anyway:
M in
x n ,1  2 . x n , 2  3 . x n , 3  4 . x n , 4
30
EE 5301 - VLSI Design
Automation I
ILP Formulation of MR-LCS
• Dual problem to ML-RCS
• Objective:
– Goal is to optimize total resource usage vector, a.
– Objective function is cTa , where entries in c
are respective area costs of resources (the ak
inequality constraint in ML-RCS is now an inequality
with the variable ak (element of a) in the RHS.
• Constraints:
– Same as ML-RCS constraints, plus:
l . x nl  l  1
– Latency constraint added:

l
31
EE 5301 - VLSI Design
Automation I
[©Gupta]
Search Techniques
1
A
A
1
B
B
C
E
3 G
G
F
Graph
dfs(v) /* for basic graph visit or for soln
finding when nodes are partial solns */
v.mark = 1;
for each (v,u) in E
if (u.mark != 1) then
dfs(u)
Algorithm Depth_First_Search
for each v in V
v.mark = 0;
for each v in V
if v.mark = 0 then
if G’s nodes are partial solns. or
“ordinary” nodes then dfs(v);
else soln_dfs(f, v);
2
D
D
D
5
F
C
4
6
2
E
B
C
3
A
6
E
7
4
G
F
5
DFS
BFS
soln_dfs(curr_path, v)
/* used when nodes are basic elts of the problem and
not partial soln nodes */
v.mark = 1; curr_path = curr_path U {v}
If path to v is a soln, then return(1, curr_path);
for each (v,u) in E
if (u.mark != 1) then
(found ,curr_path) = soln_dfs(curr_path U {v}, u)
if (found = 1) then
return(found, curr_path)
end for;
v.mark = 0; /* can visit v again to
from another soln on a different path */
return(0)
Search Techniques—Exhaustive DFS
optimal_soln_dfs(v)
/* used when nodes are basic elts of the problem and
not partial soln nodes */
begin
v.mark = 1; curr_path = curr_path U {v}’
If curr_path is a soln, then begin
if cost(curr_path) < best_cost then begin
best_soln=curr_path; best_cost=cost(curr_path);
endif
v.mark=0; return;
Endif
for each (v,u) in E
if (u.mark != 1) then
cost(curr_path) += edge_cost(v,u);
optimal_soln_dfs(curr_path, u)
end for;
v.mark = 0; /* can visit v again to
form another soln on a different path */
end
1
A
B
C
6
2
E
3 G
D
F
5
DFS
Algorithm Depth_First_Search
for each v in V
v.mark = 0;
best_cost = infinity; cost = 0;
optimal_soln_dfs(f, root);
4
9
A
Search techs for a TSP example
B
A
5
3
5
5
E
A
4
E
B
8
F
F
C
C
7
1
F
D
F
2
D
E
F
D
E
F
E
E
D
x
A
A
A
31
33
TSP graph
27
Solution nodes
Exhaustive search using DFS (w/ backtrack) for finding
an optimal solution
Y = partial soln. = a path from root to current “node” (a basic elt. of
the problem, e.g., a city in TSP, a vertex in V0 or V1 in min-cut
partitioning). We go from each such “node” u to the next one u that is
“reachable “ from u in the problem “graph” (which is part of what you
have to formulate)
Best-First Search
BeFS (root)
begin
open = {root} /* open is list of gen. but not
u
10
expanded nodes—partial solns */
costs
best_soln_cost = infinity;
(1)
while open != nullset do begin
12
15
19
curr = first(open);
if curr is a soln then return(curr) /* curr
(2)
is an optimal soln */
else children = Expand_&_est_cost(curr);
16
18
18
/* generate all children of curr & estimate
17
their costs---cost(u) should be a lower
(3)
Expand_&_est_cost(Y)
bound of cost of the best soln reachable
begin
from u */
children = nullset;
for each child in children do begin
for each basic elt x of problem “reachable” from Y &
if child is a soln then
can be part of current partial soln. Y do begin
delete all nodes w in open s.t.
if x not in Y and if feasible
cost(w) >= cost(child);
child = Y U {x};
path_cost(child) = path_cost(Y) + cost(Y, x)
endif
/* cost(Y, x) is cost of reaching x from Y */
store child in open in increasing order
est(child) = lower bound cost of best soln
of cost;
reachable from child;
endfor
cost(child) = path_cost(child) + est(child);
endwhile
children = children U {child};
end /* BFS */
endfor
return(children);
end /* Expand_&_est_cost(Y);
root
Best-First Search
Y = partial soln.
root
u
10
costs
(1)
12
15
19
(2)
18
16
18
17
(3)
Proof of optimality when cost is a LB
• The current set of nodes in “open”
represents a complete front of generated
nodes, i.e., the rest of the nodes in the
search space are descendants of “open”
(can be proved easily by induction on the
# of expansion operations with a basis
value of 0—corresponding to the root
node)
• Assuming the basic cost (cost of adding
an elt in a partial soln to contruct another
partial soln that is closer to the soln) is
non-negative, the cost is monotonic, i.e.,
cost of child >= cost of parent
• If first node curr in “open” is a soln, then
cost(curr) <= cost(w) for each w in “open”
• Cost of any node in the search space
not in “open” and not yet generated is >=
cost of its ancestor in “open” and thus >=
cost(curr). Thus curr is the optimal (mincost) soln
Search techs for a TSP example (contd)
A
A
9
A
B
5
E
B
4
3
5
5
E
C
8
F
F 8+16
F
21+6
F C
D
2
D
MST for node (A, E, F); =
MST{F,A,B,C,D}; cost=16
• Lower-bound cost estimate:
MST({unvisited cities} U
{current city} U {start city})
• This is a LB, since structure
(spanning tree) is a superset of reqd
soln structure (cycle)
• min(metric M’s values in set S)
<= min(M’s values in subset S’)
• Similarly for max??
E
F
23+8
5+15
D
F
C
E
11+14
22+9
1
Path cost for
(A,E,F) = 8
D
C
7
F
C
E D
X
X
B
F
14+9
X
F
F
A
A
27
20
BeFS for finding an optimal TSP solution
Set S of all spanning
trees in a graph G
Set S’of all Hamiltonian
paths (that visits a node
exactly once)in a graph G
S
S’
BFS for 0/1 ILP Solution
• X = {x1, …, xm} are 0/1 vars
• Choose vars Xi=0/1 as next
nodes in some order (random
or heuristic based)
root
(no vars
exp.)
X2=1
X2=0
Solve LP
w/ x2=1;
Cost=cost(LP)=C2
Solve LP
w/ x2=0;
Cost=cost(LP)=C1
X4=0
Cost relations:
C5 < C3 < C1 < C6
C2 < C1
C4 < C3
X4=1
Solve parent LP
w/ x2=1, x4=1;
Cost=cost(LP)=C4
Solve parent LP
w/ x2=1, x4=0;
Cost=cost(LP)=C3
X5=0
Solve parent LP
w/ x2=1, x4=1, x5=0
Cost=cost(LP)=C5
X5=1
Solve parent LP
w/ x2=1, x4=1, x5=1
Cost=cost(LP)=C6
optimal soln
Iterative Improvement Techniques
Iterative improvement
Deterministic Greedy
Locally/immediately
greedy
Non-locally greedy
Make move that is
Make move that is
immediately (locally) best best according to some
non-immediate (non-local)
Until (no further impr.)
metric (e.g., probability(e.g., FM)
based lookahead as in
PROP)
Until (no further impr.)
Stochastic
(non-greedy)
Make a combination of
deterministic greedy moves
and probabilistic moves that
cause a deterioration (can
help to jump out of local
minima)
Until (stopping criteria
satisfied)
• Stopping criteria could be
an upper bound on the total
# of moves or iterations
Dynamic Programming—Rough Work (Ignore for now)
SwP_Min(T(y), p(xconst,y)
SwP_Min(T(x), p(x, yconst)
• A negative example: Total sw. probability minimization in tech. mapping in a fanout-free circuit = SwP-Min(C,
p(z)): C is a fanout-free ckt w/ z as its output. The problem is to minimize the p(z) + sum of sw. probabilities (01
transition probabilities) at the o/p of TM’ed gates in C excluding z (z’s sw. prob. is included in p(z)).
• For a cut Ci w/ z at its o/p that can be TM’ed to a gate gi in the library, let
x, y be 2 i/ps. Let p(x,y) be the mapping of p(z), based on gi, in terms of
z
Ci
only the 4 transition probs. at x and y. Then, since p(x,y) is inseparable in
Sw. prob. at z
terms of the trans.probs. of x and y, the exact problem to be solved is
in terms of
various
SwP_Min(C – Ci, p(x,y)), where C-Ci has 2 o/ps x, y, and thus independent
trans. probs.
cuts have to be taken for x and y, and the combination of these 2 sets of
at all fanins
cuts will come into play. This will lead to a combinatorial explosion as we
cut by
got further down the circuit to the inputs of each pair of cuts for x and y.
subset Si(z)
y
• The final formulation is SwP-Min(C, p(z)) = Minall feasible Ci at z (SwP_Min(C –
Ci, p(X(Ci)), where X(Ci) is the set of i/ps generated by Ci.
x
• The above is not a D&C approach. In a D&C approach, we can create two
subproblems SwP_Min(T(x), p(x) = p(x, yconst)) and SwP_Min(T(y), p(y) =
p(xconst, y), where T(x) is the sub-circuit of C (a subtree) w/ x as its o/p, and
p(x, yconst) is p(x, y) assuming some constant values for the 4 trans. probs. at
y (or the subset of trans. probs. of y involved in p(x,y)).
• Since there is no guarantee, and in fact it is unlikely, that the assumed
constant values for the trans. probs. at y will be the exact trans. probs. one
obtains by optimally solving the problem SwP_Min(C – Ci, p(x,y)) (which is
the exact problem to solve), an optimal soln. to SwP_Min(T(x), p(x, yconst)) is
not guaranteed to lead to, i.e., be part of the optimal soln. to SwP_Min(C,
Fig.: D&C approach for
p(z)). A similar argument holds for the optimal soln. to and SwP_Min(T(y),
SwP_Min(C, P0->1(z))
p(xconst, y).
• Another way to look at the reason for this, is to see that the two subproblems are not independent (the trans.
probs. implied at their o/ps by their solns. is needed to solve each subproblem leading to a cyclic dependency).
• Since the above D&C seems to be the only way to break up SwP_Min(C, p(z)) into subproblems, this problem is
not amenable to DP as it does not have the optimal substructure property.
Download