CoSynthesis_Algorithms-Distributed.ppt

advertisement
Co-Synthesis Algorithms:
Distributed System CoSynthesis
Part of
HW/SW Codesign of Embedded
Systems Course (CE 40-226)
Winter-Spring 2001
Codesign of Embedded Systems
1
Topics




Introduction
Preliminaries
Hardware/Software Partitioning
Distributed System Co-Synthesis
Winter-Spring 2001
Codesign of Embedded Systems
2
Topics



Introduction
A Integer Linear Programming Model
A Heuristic Algorithm
Winter-Spring 2001
Codesign of Embedded Systems
3
Introduction to
Distributed System Co-Syn.


Does not use an architectural template
Instead, creates a multiprocessor architecture
during co-synthesis

Usually heterogeneous multiprocessor in





Processing Elements
Communication Channels
Topologies
Less emphasis on the design of ASICs
More emphasis on the design of
multiprocessor topology
Winter-Spring 2001
Codesign of Embedded Systems
4
Introduction to
Distrib. Sys. CoSyn. (cont’d)

Very common in practice

Specially large CPU + small microcontrollers +
small ASICs
Winter-Spring 2001
Codesign of Embedded Systems
5
Co-Synthesis Algorithms:
Distributed System CoSynthesis
Integer Linear Programming
Model
Winter-Spring 2001
Codesign of Embedded Systems
6
ILP Model

Introduction

Linear Programming (LP):

Minimizing/maximizing a Linear target function



Subject to a set of Linear constraints
Current algorithms: Does find the optimal solution, or
else the problem is not feasible at all.
Example: Knapsack problem
Winter-Spring 2001
Codesign of Embedded Systems
7
ILP Model (cont’d)

Introduction (cont’d)

Integer Linear Programming (ILP)



Integer-solution counterpart of LP
Example: Knapsack problem with integer-solution
constraint
Current algorithms: Absolute optimal solution is found


Winter-Spring 2001
Takes much CPU time
Only feasible for fairly small problems
Codesign of Embedded Systems
8
Prakash-Parker ILP Model

By Prakash and Parker, 1992

Developed an ILP formulation


Inputs to the algorithm



Single-rate task graph
Technology model for the PEs, communication channels,
and processes’ execution characteristics on them
Target function


Used general ILP solvers to solve it
Minimize system implementation cost
Constraints

Describe the requirements of the system
Winter-Spring 2001
Codesign of Embedded Systems
9
Prakash-Parker
ILP Model (cont’d)

Algorithm classification criteria

Input Model


Target Architecture


Distributed multiprocessor
Quantum


Single-rate task graph
Processes of the task graph
Cost Estimation


Based on technology models provided to the algorithm
Represented as target function of the ILP
Winter-Spring 2001
Codesign of Embedded Systems
10
Prakash-Parker
ILP Model (cont’d)

Algorithm classification criteria (cont’d)

Performance Estimation


Scheduling, Allocation


Based on technology models provided to the algorithm
Embedded in the ILP formulation constraints
Algorithm details
Winter-Spring 2001
Codesign of Embedded Systems
11
Prakash-Parker
ILP Model (cont’d)

Algorithm classification topics (cont’d)

Algorithm details

Target Function


Minimize cost
Sets of Constraints


Winter-Spring 2001
Allocation (PE and communication links)
Scheduling (Processes on PEs, and communications on
links)
Codesign of Embedded Systems
12
Prakash-Parker
ILP Model (cont’d)

Sets of Constraints (cont’d)

Allocation

Processor-selection constraints


Each process must be assigned to one and only one (not
more, not less) processor
Data-transfer type constraints

Winter-Spring 2001
Each communication must be either local or multi-hop. But
not both, and not neither
Codesign of Embedded Systems
13
Prakash-Parker
ILP Model (cont’d)

Sets of Constraints (cont’d)

Scheduling

Input-availability constraints


Output-availability constraints


Data must obey the fractional output generation
parameters
Process execution start/end constraints


Data cannot be used by the sink process until after
produced by the source process
Process finish-time depends on its start-time and the PE on
which it executes
Data-transfer start/end constraints

Winter-Spring 2001
Similar to previous, but using data transfer times
Codesign of Embedded Systems
14
Prakash-Parker
ILP Model (cont’d)

Sets of Constraints (cont’d)

Scheduling (cont’d)

Processor-usage-exclusion


Processes on a single PE must not execute simultaneously
Communication-usage-exclusion

Winter-Spring 2001
Multiple communications must not be scheduled on the
same link simultaneously
Codesign of Embedded Systems
15
Prakash-Parker
ILP Model (cont’d)

Experimental Results

Applied only to relatively small problems


Reason: use of general ILP solvers
Their largest task graph: 9 processes


Took 6000 CPU minutes on an unspecified processor
Significance of the work


Did Achieve precisely optimal solutions on those
examples which they could solve
Used as benchmarks for heuristic co-synthesis algorithms
Winter-Spring 2001
Codesign of Embedded Systems
16
Co-Synthesis Algorithms:
Distributed System CoSynthesis
Wolf’s Heuristic Algorithm
Winter-Spring 2001
Codesign of Embedded Systems
17
Wolf’s Heuristic Algorithm

As ever, topics of importance:








System Specification Language/Model
Target Architecture
Functionality (Allocation/Scheduling) Quantum
Allocation Strategy
Scheduling Strategy
Cost Estimation
Performance Estimation
Algorithm Details
Winter-Spring 2001
Codesign of Embedded Systems
18
Wolf’s Heuristic Algorithm
(cont’d)

Wolf’s Heuristic Algorithm

System Specification Language/Model


Target Architecture


Primal approach: Performance is the major objective
Scheduling


Heterogeneous multiprocessor architecture
Allocation


Algorithm input: single-rate task graph
?
Functionality Quantum

Processes in a single-rate task graph
Winter-Spring 2001
Codesign of Embedded Systems
19
Wolf’s Heuristic Algorithm
(cont’d)

Wolf’s Heuristic Algorithm (cont’d)

Performance Estimation



Cost Estimation



Component Technology Library
Run-time of each process on each available PE is
supposed to be known
Component Technology Library
Total Cost = Si (Cost of PEi) +
Sj (Cost of Comm_Channelj)
Algorithm Details
Winter-Spring 2001
Codesign of Embedded Systems
20
Wolf’s Heuristic Algorithm
Details


First ignore communication costs. Later, take them
into account
Steps:
1. Create an initial feasible solution, and perform an initial
scheduling on it.

Initial feasible solution: assign each process to a separate PE
2. Reallocate processes to PEs to minimize total PE cost.

Possibly eliminate PEs from initial feasible solution
3. Reallocate processes again to minimize the amount of
communication required between PEs
4. Allocate communication channels
5. Allocate IO devices. (Internal or external to PEs)
Winter-Spring 2001
Codesign of Embedded Systems
21
Wolf’s Heuristic Algorithm
Details (cont’d)

The most important step: 2. Initial reallocation


Reason: PE cost is the dominant hardware cost
Initial reallocation
1. PE cost reduction:
1.1 Scan the PEs, starting with the least-utilized PE.
1.2 Try to reallocate that PE’s processes to other existing PEs
1.3 If no process left on the PE, eliminate it
otherwise replace the PE with a suitable lower-cost one
2. Pair-wise merge
Merge a pair of PEs into a single, more powerful one
3. Load balancing
Winter-Spring 2001
Codesign of Embedded Systems
22
Wolf’s Heuristic Algorithm
Details (cont’d)

Initial reallocation (cont’d)



“PE cost reduction” phase tries to reallocate multiple
processes at a time
The above 3 phases are repeated as far as possible
Experimental results




Finds optimal solutions to most of ILP-solved examples
Finds near-optimal solutions for the remaining examples
Showed good results on larger examples
Requires very little run-time

Winter-Spring 2001
Due to multiple-move strategy during PE cost minimization
phase
Codesign of Embedded Systems
23
What we learned today

Distributed System Co-Synthesis: The other
broad category of co-synthesis algorithms
Winter-Spring 2001
Codesign of Embedded Systems
24
Download